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OM nucleic - nucleic search, using sw model 

Run on: February 26, 2004, 00:47:23 ; Search time 3952.5 Seconds 

(without alignments) 
17679.337 Million cell updates/sec 

Title: US-09-989-981A-5 
Perfect score: 2340 

Sequence: 1 gtcaggtggagcaggcaggg aatattcataaacctatggg 2340 

Scoring table: IDENTITY_NUC 

Gapop 10,0 , Gapext 1.0 

Searched: 27513289 seqs, 14931090276 residues 

Total number of hits satisfying chosen parameters: 55026578 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : EST:* 

1: em_estba:* 
2 : s t hum : * 

3: em_estin:* 

4: em_estmu:* 

5: em_estov:* 

6: em_estpl:* 

7: em_estro:* 

8: em_htc:* 

9: gb_estl:* 
10: gb_est2:* 
11: gb_htc:* 
12: gb_est3:* 
13: gb_est4:* 
14: gb_est5:* 
15: em_estfun:* 
16: em_estom:* 
17 : em_gss_hum: * 
18: em_gss_inv:* 
19: em_gss_pln:* 
20: em_gss_vrt:* 
21: em_gss_fun:* 
2 2 : em_g s s_mam : * 
2 3 : em_gs s_mus : * 
24: em_gss_pro:* 
25: em_gss_rod:* 
26: em_gss_phg:* 
27: em_gss_vrl:* 



28: gb_gssl:^ 
29: gb_gss2:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 
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463.2 


19. 


8 


713 


10 


BB598373 


BB598373 BB598373 


c 


5 


429 


18 . 


3 


432 


9 


AI033358 


AI033358 ox02fl0.s 
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9 
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11 . 
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21 
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10 . 


. 0 
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14 
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c 


22 
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10 . 


. 0 
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14 


T86285 
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25 


206 


8 . 


. 8 


502 


9 


AA237916 


AA237916 mxl4e08.r 




26 


206 


8 . 


o 
, 0 


535 


9 


AA244605 


AA244605 mx02dl0.r 


c 


27 


203.6 


8 , 


, 7 


706 


29 


AG094162 


AG094162 Pan trogl 
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Q 
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.1 
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.7 
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9 
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7 


.6 
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.3 
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14 
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38 
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.3 
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14 
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CB812866 AMGNNUC:T 
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.1 
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CG262933 OG1DH53TV 
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6 


.1 


442 


9 


AA217272 


AA217272 mu89f07.r 




41 
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6 


.1 
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29 


AG122753 


AG122753 Pan trogl 




42 


139.8 
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.0 
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43 
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5 


.9 
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. 8 
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ALIGNMENTS 



RESULT 1 
AV689089 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



FEATURES 

source 



AV689089 594 bp mRNA linear EST 16-JAN-2002 

AV689089 GKC Homo sapiens cDNA clone GKCDZB07 5', mRNA sequence. 
AV689089 

AV68908 9.1 GI: 10290952 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 594) 

Xu,X., Huang, J., Xu,Z., Qian,B., Zhu,Z., Yan,Q., Cai,T., Zhang, X., 
Xiao,H., Qu,J., Liu,F., Huang, Q., Cheng, Z., Li,N., Du,J., Hu,W., 
Shen,K., Lu,G., Fu,G., Zhong,M., Xu,S., Gu,W., Huang, W., Zhao,X., 
Hu,G., Gu,J., Chen,Z. and Han,Z. 

Insight into hepatocellular carcinogenesis at transcriptome level 

by comparing gene expression profiles of hepatocellular carcinoma 

with those of corresponding noncancerous liver 

Proc. Natl. Acad. Sci. U.S.A. 98 (26), 15089-15094 (2001) 

21625106 

11752456 

Contact: Zeguang Han 

Chinese National Human Genome Center at Shanghai 

351 Guo Shoujing Road, Zhang jiang Hi-Tech Park, Pudong, Shanghai 

201203, P. R. China 

Tel: 86-21-50801919(ex.45) 

Fax: 86-21-50801922 

Email: hanzg@chgc.sh.cn 

This clone is available at CHGC in Shanghai. 
Location/Qualifiers 
1. .594 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone="GKCDZB07 " 

/tissue_type="hepatocellular carcinoma" 

/dev_stage="Adult" 

/lab_host="SOLR" 

/clone_lib="GKC" 

/note="Vector: pBluescript sk(-); Site_l: EcoRI; Site_2: 
Xhol" 



ORIGIN 



Query Match 23.2%; Score 544; DB 9; Length 594; 

Best Local Similarity 98.9%; Pred. No. 2.7e-118; 

Matches 558; Conservative 0; Mismatches 2; Indels 4; 



Gaps 



1; 



Qy 124 6 GAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCT 1305 

I I I I I I I I I I I I I M I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M M I 

Db 1 GAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCT 60 



Qy 1306 CCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGT 1365 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 CCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGT 12 0 

Qy 1366 AGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAA 1425 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 AGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAA 18 0 

Qy 1426 TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAA 1485 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAA 240 

Qy 1486 GTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCAT 1545 

I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
Db 241 GTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCAT 300 

Qy 154 6 GATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGG 1605 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I 
Db 301 GATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGG 360 

Qy 1606 ATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCT 1665 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 361 ATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCT 420 

Qy 1666 ACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGC 1725 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 421 ACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGC 480 

Qy 172 6 GGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACTUVGAAATGCCCATTCCTTTTAA 17 85 

M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 481 GGNGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTT — 538 

Qy 178 6 AATCATCAGTTATTTTACATTCCA 1809 

I M I I I I I I I I I I I I I I I I I I 
Db 539 — TCATCAGTTATTNTACATTCCA 560 



RESULT 2 
AV694671 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



J0URN7VL 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



AV694671 597 bp mRNA linear EST 16-JAN-2002 

AV694671 GKC Homo sapiens cDNA clone GKCDZG05 5'^ mRNA sequence. 
AV694671 

AV694671.1 GI: 10296534 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 597) 

Xu,X., Huang, J., Xu,Z., Qian,B., Zhu,Z., Yan^Q., Cai,T., Zhang, X., 
Xiao,H., Qu,J., Liu,F., Huang, Q., Cheng, Z., Li,N., Du,J., Hu,W. , 
Shen,K., Lu,G., Fu,G., Zhong,M. , Xu, S . , Gu,W., Huang, W., Zhao,X., 
Hu,G., Gu,J., Chen,Z. and Han,Z. 

Insight into hepatocellular carcinogenesis at transcriptome level 
by comparing gene expression profiles of hepatocellular carcinoma 
with those of corresponding noncancerous liver 
Proc. Natl. Acad. Sci. U.S.A. 98 (26), 15089-15094 (2001) 



MEDLINE 
PUBMED 
COMMENT 



FEATURES 

source 



21625106 
11752456 

Contact: Zeguang Han 

Chinese National Human Genome Center at Shanghai 

351 Guo Shoujing Road, Zhangjiang Hi-Tech Park, Pudong, Shanghai 

201203, P. R. China 

Tel: 86-21-50801919 (ex.45) 

Fax: 86-21-50801922 

Email: hanzg@chgc.sh.cn 

This clone is available at CHGC in Shanghai. 
Location/Qualif iers 
1. .597 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon:9606" 
/ clone- " GKCDZ GO 5 " 

/tissue_type="hepatocellular carcinoma" 

/dev_stage="Adult" 

/lab_host="SOLR" 

/clone_lib="GKC" 

/note="Vector : pBluescript sk(-); Site_l: EcoRI; Site_2 : 
Xhol" 



ORIGIN 



Query Match 22.3%; 
Best Local Similarity 99.1%; 
Matches 535; Conservative 



Score 521; DB 9; Length 597; 
Pred. No. 8e-113; 
0; Mismatches 1; Indels 



4 ; Gaps 



1; 



Qy 



Db 



1246 GAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCT 1305 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
1 GAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCT 60 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1306 CCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGT 1365 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I 
61 CCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGT 120 

1366 AGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAA 1425 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
121 AGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAA 180 

1426 TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAA 1485 

I I I I M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
181 TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGTU^ 240 

1486 GTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCAT 1545 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
241 GTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCAT 300 

1546 GATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGG 1605 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
301 GATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGG 360 



Qy 1606 ATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCT 1665 

I I I I I I .I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 ATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCT 420 



Qy 



1666 ACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGC 1725 



Db 421 ACTTGGTATCGTCC TCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGC 476 

Qy 1726 GGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAA 1785 

M I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 477 GGNGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAA 536 



RESULT 3 

AV720911/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



Craniata; Vertebrata; Euteleostomi; 
Catarrhini; Hominidae; Homo. 



Jia,J., Fu,G. 
. and Han^Z. 



Gao,X., Xu,Z., 

Song,H., Cheng, Z., 
Ren,S., Zhong,M., 



AV720911 477 bp mRNA linear EST 16-OCT-2000 

AV720911 GLC Homo sapiens cDNA clone GLCETC06 5', mRNA sequence. 
AV720911 

AV720911.1 GI: 10818063 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 477) 

Qian^B., Wu,T,, Huang^^Q., Huang, Kang,B. 
Xiao,H., Xu,X., Li,N., Peng,Y., Liu,F., Qu,J. 
Zeng,L., Xu,S., Gu,W., Tu,Y., 
Lu,G., Yang,Y., Gu,Y., Chen,Z. 
Homo sapiens cDNA GLC clones 
Unpub lished (2000) 
Contact: Zeguang Han 

Chinese National Human Genome Center at Shanghai 

351 Guo Shoujing Road, Zhangjiang Hi-Tech Park, Pudong, Shanghai 
201203, P. R. China 
Tel: 86-21-50801919(ex.45) 
Fax: 86-21-50801922 
Email: han2g@chgc.sh.cn 

This clone is available at CHGC in Shanghai. 
Location/Qualifiers 
1. .477 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone-"GLCETC06" 

/tissue_type="corresponding non cancerous liver tissue" 
/de v_s t age== " Adul t " 
/lab__host="SOLR" 
/clone_lib="GLC" 

/note="Vector : pBluescript sk(-); Site_l: EcoRI; Site_2 : 
Xhol" 



ORIGIN 



Query Match 20.4%; Score 477; DB 9; Length 477; 

Best Local Similarity 100.0%; Fred. No. 2.2e-102; 

Matches 477; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



Qy 1843 GTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAAT 1902 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I M I I I 
Db 477 GTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAAT 418 



Qy 1903 GTGTGCCTTCACTC7VAGGAATTCJ\ATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAG 1962 



Db 



417 



358 



Qy 1963 ATTCACAATG7\ACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAAT 2022 

I I I I I I I I I I I M I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 357 ATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAAT 2 98 

Qy 2023 AGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAA 2082 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 297 AGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAA 238 

Qy 2083 ATGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCAT 2142 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 237 ATGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCAT 178 

Qy 2143 GTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCT 2202 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 177 GTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCT 118 

Qy 22 03 CTTGGATCCAAGCAGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAA 22 62 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 117 CTTGGATCCAAGCAGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAA 58 

Qy 2263 CTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAA 2319 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I. I I I I 
Db 57 CTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAA 1 



RE'SULT 4 
BB598373 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BB598373 713 bp mRNA linear EST 26-OCT-2001 

BB598373 RIKEN full-length enriched, adult male liver tumor Mus 
musculus cDNA clone C730003G04 5*, mRNA sequence. 
BB598373 

BB598373.2 GI:16450340 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 713) 

Arakawa,T., Carninci,P., Fukuda,S,, Furuno,M., Hanagaki,T., 
Kara, A., Hiramoto,K., Hori,F., Ishii,Y., Ito,M. , Kawai,J., 
Konno,H., Kouda^M. , Koya,S., Matsuyama, T . , Miyazaki,A., Nomura, K., 
Ohno,M., Okazaki,Y., Okido,T., Saito,R., Sakai^C, Sakai,K., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Suzuki, H., Tagami,M., Tagawa,A., Takahashi, F. , 
Takeda,Y., Tanaka,T., Toya,T., Muramatsu,M. and Hayashizaki, Y. 
RIKEN Mouse ESTs (Arakawa,T., et al. 2001) 
Unpublished (2001) 

On Dec 1, 2000 this sequence version replaced gi: 11506974. 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 



Fax: 81-45-503-9216 

Email : genome- res @gs c . riken. go. jp, 

URL : http : / / genome . gs c . riken . go . j p/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata^K., 
Itoh,M., Konno^H., Okazaki,Y., Muramatsu,M. and Hayashizaki , Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura^S., Kawai^J., Okazaki,Y., Muramatsu^M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. , 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi, Y. , Shibata,K., Itoh,M., Carninci,P,, 
Sugahara^Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Kondo,S., Shinagawa, A. , Saito,T., Kiyosawa,H., Yamanaka,I., 
Aizawa,K., Fukuda,S., Kara, A., Itoh,M., Kawai,J., Shibata,K. and 
Hayashizaki, Y. 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details. 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 
FEATURES Location/Qualifiers 
source 1. .713 

/organism="Mus mus cuius" 

/mol_type-"mRNA" 

/db_xref="taxon: 10090" 

/clone="C730003G04" 

/ sex="male" 

/tissue_type="liver tumor" 
/dev_s t age= " adult " 
/lab_host-"DH10B" 

/clone_lib="RIKEN full-length enriched, adult male liver 
tumor" 

/note=="Site_l: Sail; Site_2 : BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5* 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 * ] . cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5* 

GAGAGAGAGATTCTCGAGTTAATTAAATTAATCCCCCCCCCCCCC 3 ' ] - cDNA 



was cleaved with BamHI and Xhol. Vector: a modified 
pBluescript KS(+) after bulk excision from Lambda FLC I. 
Tissue was provided by William A. Held, Roswell Park 
Cancer Institute, Department of Molecular and Cellular 
Biology, Elm and Carlton Streets, Buffalo, NY 14263, whose 
assistance we gratefully acknowledge." 



ORIGIN 



Query Match 19.8%; Score 463.2; DB 10; Length 713; 

Best Local Similarity 79.3%; Pred. No. 5.1e-99; 

Matches 562; Conservative 0; Mismatches 144; Indels 3; Gaps 1; 

Qy 90 TTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTC 149 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II Ml III 

Db 2 TTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGC 61 

Qy 150 TCCT^GTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGC 209 

II I I I I I I I II I I I M II I I I I I I I I I I I I I I I I I I I I II 
Db 62 CTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGG 121 

Qy 210 CT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCT 266 

II I I I I I I I M I II I II M I II II I I I I I I I I I I I I I I I I Mill 

Db 122 CTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGGGCCTT 181 

Qy 267 GGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCT 326 

I I I II I I I I II Ml I II I I I II II I II II I I I I I I I II I II II II I I II I II 

Db 182 GGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCT 241 

Qy 327 TGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCA 386 

II I I I I I II II II I I II I II I I I I II I I II I I I I I II II I I I II I II I I II 

Db 242 TGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCA 301 

Qy 387 CGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGT 446 

I I II I I I I II II I M I I I II I I M II I I I I I II I I II I I I I II I II II I I I 

Db 302 CGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGT 361 

Qy 447 ATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGC 506 

I II I I I III I I I I I II I I II I I II I I I II I I II I II II I I I II II II I I I II 
Db 362 TTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGC 421 

Qy 507 AGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGC 566 

I I I II I I I I II I I II I M I II I I M II II I I I I II III I I II I III III 

Db 422 AGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGC 481 

Qy 567 TGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAG 626 

I I I II II Mill M I I II II I I I II II I I I II I I I I I I I M Ml 

Db 4 82 TGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAG 541 

Qy 627 AGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTT 68 6 

I I II I I I II I I M II I II II II I I I I I I I I II II III I II Mill I II I 

Db 542 AGCTGAGCCTGAGCCACGTGGCGGACCAT^TGATTGGCAGCTATAATTTTGGGGGAATTT 601 

Qy 687 CCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCA 74 6 

II I M M I II II I II I II II I M II I II II I I II II I I I I I M MUM 

Db 602 CCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCCAGGTCA 661 



Qy 



747 TGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCA 795 



662 TGATGCTAGATGAGCCAACCACAGGACTGGACTGCATGNACTGCAATCA 710 



RESULT 5 

AI033358/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



AI033358 432 bp mRNA linear EST 25-JUN-1998 

0x02 f 10. si Soares_fetal_liver_spleen__lNFLS_Sl Homo sapiens cDNA 
clone IMAGE: 1655179 3', mRNA sequence. 
AI033358 

AI033358.1 GI: 3254311 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 432) 

NCI-CGAP http://www.ncbi.nlm.nih.gov/ncicgap. 

National Cancer Institute, Cancer Genome T^atomy Project (CGAP) , 

Tumor Gene Index 

Unpublished (1997) 

Contact: Robert Strausberg, Ph.D. 

Email: cgapbs-r@mail.nih.gov 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
Seq primer: -40ml3 fwd. ET from Amersham 
High quality sequence stop: 364. 

Location/Qualifiers 

1. .432 

/organism="Homo sapiens" 
/mol_type-"mRNA" 
/db_xref="taxon:9606" 
/clone="IMAGE: 1655179" 
/ sex="male" 

/dev_stage="20 week-post conception fetus" 
/lab_host="DH10B (ampicillin resistant)" 
/clone_lib-"Soares_fetal_liver_spleen_lNFLS_Sl" 
/note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) 
with a modified polylinker; Site_l: Pac I; Site_2 : Eco RI ; 
This is a subtracted version of the original Scares fetal 
liver spleen INFLS library. 1st strand cDNA was primed 
with a Pac I - oligo(dT) primer [5* 
AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3 ' ] / 
double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Pac I and cloned into the Pac I 
and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization. Library 
constructed by Bento Soares and M.Fatima Bonaldo." 



ORIGIN 



Query Match 18.3%; Score 429; DB 9; Length 432; 

Best Local Similarity 100.0%; Pred. No. 5.7e-91; 
Matches 429; Conservative 0; Mismatches 0; Indels 



0; Gaps 



0; 



Qy 1908 CCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCA 1967 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db . 432 CCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCA 373 



Qy 1968 CAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTG 2027 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 372 CAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGG7UVTAGTTG 313 

Qy 202 8 TTTTCAAAATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGA 2 087 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 312 TTTTCAAAATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGA 253 

Qy 2088 AGTGAAGCTGCCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATT 2147 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 252 AGTGAAGCTGCCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATT 193 

Qy 2148 TCTTTCTTGACAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGG 2207 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 192 TCTTTCTTGACAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGG 133 

Qy 2208 ATCCAAGCAGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGC 2267 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I M 
Db 132 ATCCAAGCAGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGC 73 

Qy 2268 AGGGACATGTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTC 2327 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 72 AGGGACATGTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTC 13 

Qy 2328 AT7WVCCTA 2336 

I I I I I I I I I 
Db 12 AT7V7VACCTA 4 



RESULT 6 

AI140253/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



AI140253 418 bp itiRNA linear EST 29-OCT-1998 

qe21a04.xl Soares_f etal_lung__NbHL19W Homo sapiens cDNA clone 
IMAGE: 1739598 3', mRNA sequence. 
AI140253 

AI140253.1 GI:3647710 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 418) 

NCI-CGAP http: //www. ncbi . nlm. nih . gov/ncicgap . 

National Cancer Institute, Cancer Genome Anatomy Project (CGAP) , 
Tumor Gene Index 
Unpublished (1997) 

Contact: Robert Strausberg, Ph.D. 
Email: cgapbs-r@mail.nih.gov 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
Insert Length: 1828 Std Error: 0.00 
Seq primer: -40ml3 fwd. ET from Amer sham 
High quality sequence stop: 417. 

Location/ Qualifiers 

1. .418 

/organism="Homo sapiens" 



/mol_type="inRNA" 
/db_xref="taxon:9606" 
/clone="IMAGE: 1739598" 
/dev_stage="19 weeks" 

/lab_host="DH10B (ampicillin resistant)" 
/clone__lib="Soares_fetal_lung_NbHL19W" 

/note="Organ: lung; Vector: pT7T3D (Pharmacia) with a 
modified polylinker; Site_l: Not I; Site_2 : Eco RI; 1st 
strand cDNA was primed with a Not I - oligo(dT) primer 
[5 ' -TGTTACCAATCTGAAGTGGGAGCGGCCGCAATTTTTTTTTTTTTTTTTT-3 ' ] , 
double-stranded cDNA was size selected, ligated to Eco RI 
adapters (Pharmacia) , digested with Not I and cloned into 
the Not I and Eco RI sites of a modified pT7T3 vector 
(Pharmacia) . Library went through one round of 
normalization to a Cot = 5. Library constructed by Bento 
Soares and M.Fatima Bonaldo. This library was constructed 
from the same fetus as the fetal heart library, Soares 
fetal heart NbHH19W." 



ORIGIN 



Query Match 17.9%; Score 418; DB 9; Length 418; 

Best Local Similarity 100.0%; Fred. No. 2.3e-88; 

Matches 418; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1918 AGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTT 1977 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 418 AGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTT 359 

Qy 197 8 TCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAAT 2 037 

I I I I I I I I I I I Ml I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 358 TCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAA7\AT 299 

Qy 2038 AAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTG 2 097 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 298 AAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTG 239 

Qy 2098 CCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGA 2157 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 238 CCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGA 179 

Qy 2158 CAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAG 2217 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I 
Db 178 CAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAG 119 

Qy 2218 GCCTTGTUVTGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGT 2277 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I 
Db 118 GCCTTG7VATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGT 59 

Qy 2278 GGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCT 2335 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I 
Db 58 GGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCT 1 



RESULT 7 
BF162656 

LOCUS BF162656 936 bp mRNA linear EST 30-OCT-2000 

DEFINITION 601769307F1 NCI CGAP_Lu29 Mus musculus cDNA clone IMAGE: 3988777 5', 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



inRNA sequence, 
BF162656 

BF162 656 .1 GI : 11042879 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 936) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 

Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: Gilbert Smith, Ph.D. 

cDNA Library Preparation: Life Technologies, Inc. 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http : //image . llnl . gov 
Plate: LLAM9197 row: m column: 02 
High quality sequence stop: 686. 

Location/Qualifiers 

1. .936 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="Czech II" 
/db_xref="taxon: 10090" 
/ clone= " IMAGE :3988777" 

/tissue type="spontaneous tumor, metastatic to mammary. 

Stem cell origin." 

/lab_host="DHlOB" 

/clone_lib="NCI_CGAP_Lu29" 

/note="Organ: lung; Vector: pCMV-SP0RT6; Site_l: Sail; 
Site_2: NotI; Cloned unidirectionally . Primer: Oligo dT. 
Library constructed by Life Technologies. Investigator 
providing samples: Gilbert Smith, NIH" 



ORIGIN 



Query Match 17.4%; 
Best Local Similarity 75.2%; 
Matches 534; Conservative 



Score 407.4; DB 10; 
Pred. No. l.le-85; 
0; Mismatches 172; 



Length 936; 
Indels 4; 



Gaps 



2; 



Qy 



Db 



605 AAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGC 664 

I III Mill II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I M I II I I 
127 AGGGTAGAGGCAGTCATGACAGAGCTGAGTCTGAGCCACGTGGCGGACCAAATGATTGGC 186 



Qy 

Db 

Qy 

Db 



665 AACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAG 724 

I III I II Mill II I I I I I II II II I I II I I I II I I I I I I I I I I I I M 

187 AGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAA 246 

725 CTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATG 784 

II II I I II I II M I I I II I I II I I M I I I II I I M I M I I M I I M I I I I I I 

247 CTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCAACCACAGGACTGGACTGCATG 306 



Qy 



785 ACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGTU^TTGTGGTT 84 4 



Db 



307 



366 



Qy 845 CTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTG 904 

I I I I I I I I I I I I I I I II I I II II II II II I III I I I I I I I I I I II I I I I II 
Db 367 GTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGCCATCCTG 42 6 

Qy 905 AGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGAC 964 

I I I I I I I M I I II II I I I I I M I I I I II I I I I I M I I I I I I I I I I M 

Db 427 ACTTACGGAGAGTNGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAAC 486 

Qy 965 TGCGGTTACCCTTGTCCTGTWVCATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCA 1024 

II II I I I M r I I I I I I II I I I II I II II I I I I I II II I I I I I I I I I I III 

Db 487 TGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGATTTTTACATGGACTTGACATCA 546 

Qy 1025 GTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAG7UV 1084 

I I I I I II I I I I I I II II I I I I I I I I I I I I M I MM I I I I II I I I I I II 
Db 547 GTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTACAAGCGAGTACAGATGCTGGAC 606 

Qy 108 5 TCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAG — AATATTGAAAGAATGA 1142 

I I I M I I I M I I II I II I I II I II I I II I I I I I I I I 
Db 607 TGTGCCTTC7\AGGAATCTGACATCTATCACAAAATTCTGGAGACACATTGCACAGAGCAC 666 

Qy 1143 AACACCTGAAAACGTTACCAATGGTTCCTTTCAA7VACCAAAGATTCTCCTGGAGTTTTCT 1202 

I I I M II II I I III I I I II I II I II II I I II III II I II I I I I III 
Db 667 GATACCTGAAAACCTTAACCACGGTTCCTTTCA7\AACAAAAAGATCTCCTGGGATGTTCG 72 6 

Qy 1203 CTA7VACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACT — TGGTGAGAAATAAGCTGGC 1260 

I I I I I I I I I I I II II I I I II I I I II I III 

Db 727 GCCAGCTTGGGGTCCTGGAGAGGGAATTACAAGAAACCTCCACGCGCGAATAAGCACGGC 786 

Qy 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTT 1310 

I I I I I I I I I II I I II I I II I I I I I I 

Db 7 87 ACGGATAAATGCGCCACGGCAGAACTCGGTCACGGGCCTTCACCACATAT 836 



RESULT 8 
BY742680 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



BY742680 658 bp rtiRNA linear EST 17-DEC-2002 

BY742680 RIKEN full-length enriched, adult male liver tumor Mus 
musculus cDNA clone C730040P06 5*, mRNA sequence. 
BY742680 

BY742680.1 GI : 2716837 6 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 658) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H., Kondo^S,, 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A. , 
Schonbach,C. , Gojobori,T., Baldarelli, R. , Hill,D,P., Bult,C., 
Hume, D. A., Quackenbush, J. , Schriml, L .M. , Kanapin,A., Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C., Corbani, L. E. , Cousins, S., Dalla,E., Dragani, T . A. , 
Fletcher, C, F. , Forrest, A. , Fra2er,K.S. , Gaasterland, T . , 



Gariboldi,M. , Gissi,C., Godzik,A,, Gough,J., Grimmond, S . , 
Gustincich, S. , Hirokawa^N., Jackson, I . J. , Jarvis,E.D., Kanai^A., 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya^A., 
Kurochkin,I.V. , Lee,Y., Lenhard^B., Lyons, P. A., Maglott , D . R. , 
Maltais^L., Marchionni, L. , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole^G., 
Petrovsky,N. , Pillai,R., Pontius, J. U. , Qi,D., Ramachandran, S . , 
Ravasi,T., Reed, J. C, Reed, D. J., Reid, J., Ring,B.Z., Ringwald,M., 
Sandelin,A. , Schneider, C. , Semple,C.A., Setou,M., Shiinada,K., 
Sultana, R., Takenaka,Y., Taylor, M.S., Teasdale, R. D . , Tortiita,M., 
Verardo,R., Wagner, L., Wahlestedt, C . , Wang,Y., Watanabe,Y,, 
Wells, C, Wilming, L. G. , Wynshaw-Boris , A. , Yanagisawa,M. , Yang,!., 
Yang,L., Yuan,Z., Zavolan,M., Zhu,Y., Zirnmer,A. , Carninci,P., 
Hayatsu,N. , Hirozane-Kishikawa, T . , Konno,H. , Nakamura,M. , 
Sakazume,N. , Sato,K., Shiraki,T., Waki,K., Kawai,J., Mzawa,K., 
Arakawa,T., Fukuda,S., Kara, A., Hashizume, W. , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,!., Miyazaki,A., Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa,A. , Yasunishi, A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J,, Birney,E. and Hayashizaki, Y. 

TITLE Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

JOURNAL Nature 420, 563-573 (2002) 

MEDLINE 22354683 
PUSHED 12466851 
COMMENT Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res @gsc. riken. go . jp, 

URL: http: //genome . gsc. riken. go . jp/ 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., 
Fukuda,S., Hashizume, W. , Hayashida, K. , Hirozane,T., Hori,F., 
Imotani,K., Ishii,Y., Itoh,M. , Kagawa, I., Kawai,J., Kojima,Y., 
Kondo,S., Konno,H., Koya,S., Miyazaki,A., Murata,M., Nakamura,M,, 
Nomura, K., Numazaki,R., Ohno,M., Ohsato,N., Saito,R., Sakazume,N., 
Sano,H., Sasaki, D., Sato,K., Shibata,K., Shiraki,T., Tagami,M. , 
Takeda,Y., Waki,K., Watahiki,A., Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 



FEATURES 

source 



Tissue was provided by William A. Held, Roswell Park Cancer 
Institute, Department of Molecular and Cellular Biology, Elm and 
Carlton Streets, Buffalo, NY 14263, whose assistance we gratefully 
acknowledge . 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

Location/Qualif iers 
1. .658 

/organism="Mus mus cuius" 
/mol_type-"mRNA" 
/db_xref="taxon: 10090" 
/clone="C730040P06" 
/ sex="male" 

/tissue_type="liver tumor" 
/ de v_s t age= " adul t " 
/lab_host="DH10B" 

/clone_lib="RIKEN full-length enriched, adult male liver 
tumor" 

/note="Site_l: Sail; Site_2 : BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5* 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 » ] , cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap- trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5' 

GAGAGAGAGATTCTCGAGTTAATTAAATT7\ATCCCCCCCCCCCCC 3 ' ] - cDNA 
was cleaved with BamHI and Xhol . Vector: a modified 
pBluescript KS(+) after bulk excision from Lambda FLC I. 
Tissue was provided by William A. Held, Roswell Park 
Cancer Institute, Department of Molecular and Cellular 
Biology, Elm and Carlton Streets, Buffalo, NY 14263, whose 
assistance we gratefully acknowledge." 



ORIGIN 



Query Match 16.9%; 
Best Local Similarity 78.2%; 
Matches 513; Conservative 



Score 396.4; DB 13; 
Pred. No. 4e-83; 
0; Mismatches 138; 



Length 658; 



Indels 



5; Gaps 



3; 



Qy 

Db 



9 0 TTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTC 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III III 
2 TTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGC 



149 



61 



Qy 

Db 

Qy 

Db 



15 0 TCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGC 

II I M II II I I II I I I I I I II I I I I II I I I I I I I I I I I II 
62 CTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGG 

210 CT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCT 

II I I II II I I I M II I M I I I M I I I I I M I I I I I I I I I I I I I I I 
122 CTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGGGCCTT 



209 



121 



266 



181 



Qy 



267 GGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCT 326 
I I II I II I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 



Db 182 GGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCT 241 

Qy 327 TGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCA 38 6 

I I I I I I I I I I I II I II I I I M I I I I I I I II I I I II I I I M II I I II I MM 
Db 242 TGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCA 301 

Qy 387 CGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGT 44 6 

I II II II II II II II I I II I II M II II II I II I I II I I I I I II II II II I 

Db 302 CGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGT 361 

Qy 447 ATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGC 506 

II II II III I I II II II I I II I II I I II M M II I I I I M I II II I I II II I 
Db 362 TTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGC 421 

Qy 507 AGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGC 566 

I II II II I I I I I II II II II I M II II II II II I I III II M I III Ml 

Db 422 AGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGC 481 

Qy 567 TGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAG 626 

Mill II Mill II I I II II I I II I II I I I II II I II I II I III 

Db 4 82 TGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAG 541 

Qy 627 AGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTT 686 

I M I II I II M II I I I I II I I II I II M II I II II I I I I I II II M I 
Db 542 AGCTGAGCCTGAGCCACGTGGCGGACCANATGATTGGCAGCTAT7KAATTTGGGGG-ATNT 600 

Qy 687 CCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAG 742 

III II II I I II I II II II II II M II I I II II M II I I I I I 
Db 601 CCAGTGGCGAGCGGCGCCGAGT-TCCATCGCAGCCCAACTCTTCAGGACCCCAAGG 655 



RESULT 9 
AV695922 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



AV695922 417 bp mRNA linear EST 16-JAN-2002 

AV695922 GKC Homo sapiens cDNA clone GKCDWE04 5', mRNA sequence. 
AV695922 

AV695922.1 GI: 10297785 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 417) 

Xu,X., Huang, J., Xu,Z., Qian,B., Zhu,Z., Yan,Q., Cai,T., Zhang, X., 
Xiao,H., Qu,J., Liu,F., Huang, Q., Cheng, Z., Li,N., Du,J., Hu,W. , 
Shen,K., Lu,G., Fu,G., Zhong,M., Xu,S., Gu,W., Huang, W. , Zhao,X., 
Hu,G., Gu,J., Chen,Z. and Han,Z. 

Insight into hepatocellular carcinogenesis at transcriptome level 

by comparing gene expression profiles of hepatocellular carcinoma 

with those of corresponding noncancerous liver 

Proc. Natl. Acad. Sci. U.S.A. 98 (26), 15089-15094 (2001) 

21625106 

11752456 

Contact: Zeguang Han 

Chinese National Human Genome Center at Shanghai 

351 Guo Shoujing Road, Zhangjiang Hi-Tech Park, Pudong, Shanghai 
201203, P. R. China 



FEATURES 

source 



Tel: 86-21-50801919(ex.45) 
Fax: 86-21-50801922 
Email: hanzg@chgc.sh.cn 

This clone is available at CHGC in Shanghai. 
Location/Qualif iers 
1. .417 

/organism="Homo sapiens" 
/mol_type="inRNA" 
/db_xref="taxon:9606" 
/clone="GKCDWE04" 

/tissue_type= "hepatocellular carcinoma" 

/dev_stage="Adult" 

/lab_host="SOLR" 

/clone_lib="GKC" 

/note="Vector : pBluescript sk(-); Site_l: EcoRI; 
Xhol" 



Site 2: 



ORIGIN 



Query Match 16.9%; Score 394.4; DB 9; Length 417; 

Best Local Similarity 97.9%; Pred. No. 9.8e-83; 

Matches 411; Conservative 0; Mismatches 6; Indels 3; 



Gaps 



1; 



Qy 



Db 



1246 GAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCT 1305 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1 GAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCT 60 



Qy 

Db 

Qy 

Db 

Qy 

Db 



1306 CCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTA7\AGGGTGCTATCCAGGACCGCGT 1365 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 
CCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCT GGGTGCTATCCAGGACCGCGT 117 



61 



1425 



1366 AGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAA 

I I I I I M M I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

118 AGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAA 177 

1426 TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAA 1485 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
178 TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAA 237 



Qy 

Db 

Qy 
Db 

Qy 

Db 



1486 GTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCAT 1545 

I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I 

238 GTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCAT 297 

154 6 GATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGG 1605 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I 

298 GATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGG 357 

1606 ATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCT 1665 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II 

358 ATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTCTCTTGTGCTCT 417 



RESULT 10 
BM856449 

LOCUS BM856449 471 bp mRNA linear EST 06-MAR-2002 

DEFINITION K-EST0140406 S14K402 Homo sapiens cDNA clone S14K402-48-E04 5*, 

mRNA sequence. 
ACCESSION BM85644 9 



VERSION BM856449.1 GI:19212848 

KEYWORDS EST. 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 471) 

AUTHORS Kim.N.S., Hahn^Y., Oh.J.H., Lee,J.Y., Ahn,H.Y., Chu^M.Y., Kim,M.R., 
Oh,K.J., Cheong,J.E., Sohn,H.Y., Kim,J.M., Park,H.S., Kim,S. and 
Kim, Y . S . 

TITLE 21C Frontier Korean EST Project 2001 

JOURNAL Unpublished (2 002) 
COMMENT Contact: Kim YS 

Genome Research Center 

Korea Research Institute of Bioscience & Biotechnology 
52 Eoeun-dong Yuseong-gu, Daejeon 305-333, South Korea 
Tel: +82-42-860-4470 
Fax: +82-42-860-4409 
Email : yongsung@mail . kribb . re . kr 
Plate: 48 row: E column: 04 
High quality sequence stop: 471. 
FEATURES Location/Qualifiers 
source 1. .471 

/organism="Homo sapiens" 

/mol_type="mRNA" 

/db_xref="taxon: 9606" 

/clone="S14K402-48-E04" ^ 
/cell_line="K4 02" 
/lab_host="ToplOF' " 
/clone_lib="S14K4 02" 

/note="Organ: Stomach; Vector: pTZlSRPl; Site__l: EcoRI; 
Site_2: NotI; The poly (A) + RNA was dephosphorylated with 
bacterial alkaline phosphatase (BAP) and then decapped 
with tabacco acid pyrophosphatase (TAP) . The decapped 
intact mRNA was ligated with DNA-RNA linker including EcoR 
I site by treatment of T4 RNA ligase and the first strand 
cDNA was synthesized from oligo dT-selected mRNA by 
priming with dT-tailed vector. The dT-tailed vector was 
adjusted to have about 60nt. The cDNA vector was 
circularized with E. coli DNA ligase after digestion of 
EcoRI which site is also included in vector. An RNA strand 
converted to a DNA strand by Okayama-Berg method. The 
obtained cDNA vectors were used for transformation of 
competent cells E. coli ToplOF' by electroporation method. 
The cDNA libraries constructed by this method are 
full-length enriched cDNA library." 

ORIGIN 

Query Match 15.6%; Score 365; DB 12; Length 471; 

Best Local Similarity 100.0%; Pred. No. le-75; 

Matches 365; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1976 TTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAA 2035 

I I I I M I I I I I I I I M I I I I I I I I I M I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I 
Db 1 TTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAA 60 



Qy 



2036 ATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGC 2095 



I I I I I M I Mill I I I I I I M I I I I I I I I II M I II I I I 

Db 61 ATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGC 120 

Qy 2096 TGCCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTT 2155 

I I I II II I I I I I I II I I I II I I II II I II M I I I I M I I I II I I I I I I I I M I I I II II I 
Db 121 TGCCGACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTT 180 

Qy 2156 GACAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGC 2215 

I I I I I I I I I II II I I I I I I II I I I I I I I I II I I I I I I I II II I II I I I I I I I II I I I I I I 

Db 181 GACAGGACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGC 240 

Qy 2216 AGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACAT 2275 

II I II I I I II I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 241 AGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACAT 300 

Qy 2276 GTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCT 2335 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 301 GTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCT 360 

Qy 2336 ATGGG 2340 

I I I I I 

Db 361 ATGGG 365 



RESULT 11 

AV660973 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



Craniata; Vertebrata; Euteleostomi; 
Catarrhini; Hominidae; Homo. 



FEATURES 

source 



Zhu.Z., Yan,Q., Cai,T., 
, Cheng, Z., Li,N., Du,J. 
Xu,S., Gu^W., Huang, W., 



Zhang, X. 
. Hu,W., 
Zhao,X. , 



AV660973 360 bp mRNA linear EST 16-JAN-2002 

AV660973 GLC Homo sapiens cDNA clone GLCGNC08 3*, mRNA sequence. 
AV660973 

AV660973. 1 GI : 9881987 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 360) 
Xu,X., Huang, J., Xu,Z., Qian,B., 
Xiao,H., Qu,J., Liu,F., Huang, Q. 
Shen,K., Lu,G., Fu,G., Zheng, M., 
Hu,G., Gu,J., Chen,Z. and Han, Z . 

Insight into hepatocellular carcinogenesis at transcriptome level 
by comparing gene expression profiles of hepatocellular carcinoma 
with those of corresponding noncancerous liver 
Proc. Natl. Acad. Sci. U.S.A. 98 (26), 15089-15094 (2001) 
21625106 
11752456 

Contact: Zeguang Han 

Chinese National Human Genome Center at Shanghai 

351 Guo Shoujing Road, Zhang jiang Hi-Tech Park, Pudong, Shanghai 
201203, P. R. China 
Tel: 86-21-50801919 (ex.45) 
Fax: 86-21-50801922 
Email : hanzg@chgc .sh.cn 

This clone is available at CHGC in Shanghai. 
Location/Qualifiers 
1. .360 

/organism="Homo sapiens" 



/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone="GLCGNC08" 

/tissue_type="corresponding non cancerous liver tissue" 

/dev_stage="Adult" 

/lab_host="SOLR" 

/clone_lib="GLC" 

/note="Vector : pBluescript sk(-); Site_l: EcoRI; Site_2: 
Xhol" 



ORIGIN 



Query Match 15.4%; Score 360; DB 9; Length 360; 

Best Local Similarity 100.0%; Pred. No. 1.4e-74; 

Matches 360; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1577 GGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCAC 1636 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 GGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCAC 60 

Qy 1637 TTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTC 1696 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 TTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTC 120 

Qy 1697 AACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGA 1756 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 121 AACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGA 180 

Qy 1757 AACATACAAGAAATGCCCATTCCTTTTAA7\ATCATCAGTTATTTTACATTCCAAAAATAT 1816 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 AACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATAT 240 

Qy 1817 TGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCA 1876 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 241 TGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCA 300 

Qy 1877 AATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAG 1936 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I M 
Db 301 AATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAG 360 



RESULT 12 

CA316999 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



CA316999 794 bp mRNA linear EST 09-JUL-2003 

UI-M-FWO-cbm-a-08-O-UI . rl NIH_BMAP_FW0 Mus musculus cDNA clone 
IMAGE: 6811377 5*;. mRNA sequence. 
CA316999 

CA316999. 1 GI: 24535123 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
Mammalia; Eutheria; Rodent ia; 
1 (bases 1 to 794) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 



Euteleostomi; 
Sciurognathi; Muridae; Murinae; Mus. 



Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: Dr. Jim Lin, University of Iowa 
cDNA Library preparation: Dr. M. Bento Scares, University of Iowa 
cDNA Library Arrayed by: Dr. M. Bento Soares, University of Iowa 
DMA Sequencing by: Dr. M. Bento Scares, University of Iowa 
Clone Distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http : //image . llnl . gov 
This clone was contributed by the Brain Molecular Anatomy Project 

(BMAP) 

Seq primer: pYX-5. 
FEATURES Location/Qualifiers 
source 1. .794 

/organism="Mus mus cuius" 
/mol_type="mRNA" 
/strain="C57BL/6" 
/db_xref="taxon: 10090" 
/clone="IMAGE: 6811377" 
/tissue_type="whole brain" 

/dev_stage="embryo 13 . 5 , 14 . 5, 16 . 5, 17 . 5dpc" 
/lab_host="DH10B (Tl phage resistant) " 
/clone_lib="NIH_BMAP_FWO" 

/note="Organ: Brain; Vector: pYX- Asc; Site_l: EcoR I; 
Site_2 : Not I; The library was constructed according 
Bonaldo, Lennon and Scares, Genome Research, 6:791-806, 
1996. Denatured RNA was size fractionated on a 1% agarose 
gel. First strand cDNA synthesis was primed with oligo-dT 
primer containing a Not I site. Double strand cDNA was 
size selected according to mRNA size fraction, ligated 
with EcoR I adaptor, digested with NotI and then cloned 
directionally into pYX-Asc vector. The library tag 
sequence located between the Not I site and the polyA tail 
is AGCGAGACAG. This library was created for the University 
Iowa Brain Anatomy Project (BMAP) : * Gene Discovery in the 
Developing Mouse Nervous System' , supported by National 
Institute of Mental Health (NIMH), Hemin Chin, Ph.D., 
program coordinator." 

ORIGIN 

Query Match 13.4%; Score 314.4; DB 14; Length 794; 

Best Local Similarity 71.8%; Fred. No. 1.5e-63; 

Matches 481; Conservative 0; Mismatches 103; Indels 86; Gaps 2; 

Qy 37 0 AGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGAC 429 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 87 AGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGAC 14 6 

Qy 4 30 CTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTG 4 89 

I I I I I I I I I I I I I I I I I I III 11 I I I II II I I I I I I I I I I I I I I I I I 

Db 147 CCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 206 

Qy 4 90 CTTCTCCTACG 500 

M I M I II I I I 

Db 207 CTTCTCCTACGTCCTGCAGGTGGGCGTGTCCCTGGCCCTAGCCTGCCCGGGCTCTGGCCC 266 

Qy 501 TCCTGCAGAGCGACACCCTGCTGA 524 

I I I I I I I II M I I I I II I I 



Db 



267 CTAGCCCCGGGATTTCGACGACCCTGATGTCCCCTTTCCTGCAGAGCGACGTTTTTCTGA 326 



Qy 525 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 584 

I I I I I I I I I I M I I I I I I I I I I III I I I I I III I II I II II II I I I II II 

Db 327 GCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGCT 386 

Qy 585 ATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATG 644 

I I II II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 387 CCGCGGACTTCTACAACAAGAAGGTT^AGGCAGTCATGACAGAGCTGAGCCTGAGCCACG 446 

Qy 645 TGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCC 7 04 

I I I I II I I I I I II II I I I III I II I I I I I I I I I II I II I II I I I I I I I 
Db 447 TGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCC 506 

Qy 705 GGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAA 7 64 

I II I II I I I I II I II II II II I I I I I II I I I I I I I I I II I I I I I I I I I I I 
Db 507 GAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCAA 566 

Qy 765 CCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTC 824 

I I I I I I I I I II I I I I I I I I I I I I I Mill I I I I I I I III III II I I I I I I I 

Db 567 CCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTC 626 

Qy 825 GCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCT 884 

II II I I II I I I I I I I I II I I I I I I I I II I I I I I II I I I I I I I I II II I II 

Db 627 GCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACACT 686 

Qy 885 TTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAA 944 

I I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I I I III I 
Db 687 TCGACANAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCAC-CCAGAGGAGA 745 

Qy 945 TGCTTGATTT 954 

I I I I I I I 
Db 746 TGCTGGTTCT 755 



RESULT 13 

AI597378 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



Euteleos tomi ; 
Murinae; Mus , 



AI597378 393 bp mRNA linear EST 21-APR-1999 

vj29c06.yl Stratagene mouse diaphragm (#937303) Mus musculus cDNA 
clone IMAGE: 930442 5', mRNA sequence. 
AI597378 

AI597378.1 GI:4606426 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; 
1 (bases 1 to 393) 

Marra^M., Hillier.L., Kucaba.T., Martin, J., Beck,C., Wylie,T., 
Underwood, K. , Steptoe,M., Theising,B., Allen, M., Bowers, Y., 
Person, B., Swaller,T-, Gibbons, M., Pape,D., Harvey, N., Schurk,R., 
Ritter,E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M., McCann,R., 
Waterston,R. and Wilson, R. 
The WashU-NCI Mouse EST Project 1999 
Unpublished (1999) 

Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School of Medicine 



4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email : mouseestQwatson . wustl . edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 535362 

This read is a RESEQUENCE of a previously sequenced mouse clone 
This read has been verified (found to hit its original self in the 
correct orientation) 
Seq primer: -4 0RP from Gibco 
High quality sequence stop: 389. 
FEATURES Location/Qualifiers 
source 1. .393 

/organism="Mus mus cuius" 

/mol_type="mRNA" 

/db_xref-"taxon: 10090" 

/clone="IMAGE: 930442" 

/tissue_type="diaphragm" 

/dev_stage=="adult" 

/lab_host="SOLR (kanamycin resistant)" 
/clone_lib="Stratagene mouse diaphragm (#937303)" 
/note^"Organ: diaphragm; Vector: pBluescript SK-; Site_l : 
EcoRI; Site_2: Xhol; Cloned unidirectionally from mRNA 
prepared from diaphragm muscle. Primer: Oligo dT. Average 
insert size: 1.5 kb. Uni-ZAP XR Vector; -5' adaptor 
sequence: 5* GAATTCGGCACGAG 3* ~3* adaptor sequence: 5' 
CTCGAGTTTTTTTTTTTTTTTTTT 3 ' " 

ORIGIN 

Query Match 13.2%; Score 309.8; DB 9; Length 393; 

Best Local Similarity 86.8%; Pred. No. 1.4e-62; 

Matches 341; Conservative 0; Mismatches 52; Indels 0; Gaps 0; 
Qy 1425 ATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGA 1484 

I I I I I I I I I I I I I I I I MM M M M M M M M M M M M I M M I M M I 

Db 1 ATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCATA 60 

Qy 1485 AGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCA 1544 

M I M M M M MM M M I I M M M M M I M M M M M I I M M I 

Db 61 AGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACGG 120 

Qy 1545 TGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTG 1604 

I M M M M M M M M I M M M I M M M M M M M I I M M I M M M 

Db 121 TCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTTG 180 

Qy 1605 GATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGC 1664 

M M M I M M M M M I I M M I M M M M M M I M M M M M I M M M I 

Db 181 GATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTGC 240 

Qy 1665 TACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTG 1724 

I M M M M M M M M M M M I M M M M I M M M M M M I Ml 

Db 241 TGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATCT 300 

Qy 1725 CGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTA 17 84 

I Ml M M I M M M M M M I M M M I M M IM M M M M M M I M M I 

Db 301 CTGGGCTGCTTATTGGATCTGGATTTATCAGAAACATACAAGAAATGCCCATTCCTTTAA 360 



Qy 1785 AAATCATCAGTTATTTTACATTCCAA7VAATATT 1817 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 AAATCCTGGGTTATTTTACATTCCAAAAATACT 393 



RESULT 14 

AA656720 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AA656720 424 bp mRNA linear EST 04-NOV-1997 

vp95e08.rl Stratagene mouse diaphragm (#937303) Mus musculus cDNA 
clone IMAGE: 1092518 5' similar to SW: SCRT_DROME P45843 SCARLET 
PROTEIN. ;, mRNA sequence. 
AA656720 

AA656720.1 GI:2592874 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 424) 

Marra,M., Hillier,L., Allen, M. , Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M. , Le,M. , Martin, J., Morris, M. , 
Schellenberg,K. , Steptoe,M. , Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI:598750 

Seq primer: -28ml3 revl ET from Amersham. 
Location/ Qualifiers 
1. .424 

/organism="Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clone="IMAGE: 1092518" 
/tissue_type="diaphragm" 
/ dev_stage="adult" 

/lab_host="SOLR (kanamycin resistant)" 
/clone_lib="Stratagene mouse diaphragm (#937303) " 
/note=="Organ: diaphragm; Vector: pBluescript SK-; Site_l: 
EcoRI; Site_2: Xhol; Cloned unidirectionally from mRNA 
prepared from diaphragm muscle. Primer: Oligo dT. Average 
insert size: 1.5 kb. Uni-ZAP XR Vector; ~5' adaptor 
sequence: 5* GAATTCGGCACGAG 3* --3* adaptor sequence: 5* 
CTCGAGTTTTTTTTTTTTTTTTTT 3 ' " 



ORIGIN 



Query Match 



13.1%; Score 307.2; DB 9; Length 424; 



Best Local Similarity 82.8%; Pred. No. 5.8e-62; 

Matches 351; Conservative 0; Mismatches 73; Indels 0; Gaps 0; 



Qy 1186 TTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGT 1245 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 TCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTAAT 60 

Qy 124 6 GAGA7UVTAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCT 1305 

III II I I I II I I I M I II I II I II M I I I I I I II I I M I I I I I I I I I I I I I I I 

Db 61 GAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTCCT 120 

Qy 1306 CCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGT 1365 

I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I M I I M I I I 

Db 121 CATTTTCTACCTTCTCCGCGTCCAGAACAACACGCTAAAGGGCGCTGTGCAGGACCGCGT 18 0 

Qy 1366 AGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAA 1425 

II II II II III I I I I I I I I II I I I II II I I I I I I I I I I I II I I I M I I I 

Db 181 GGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTGAA 240 

Qy 1426 TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAA 1485 

II I I I I I I I I Mill MM II II I II I I II I I II M I M I II Mill II II II 

Db 241 TCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCATAA 300 

Qy 1486 GTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCAT 1545 

M I M II M I I II I II I II I II I I II II II M I I I I I I II I I I II I II I 

Db 301 GTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACGGT 360 

Qy 1546 GATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGG 1605 

I I I I II I I M I I II II I M II I II I II II II I I I II M I II II I I II M M I 
Db 361 CATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTTGG 420 

Qy 1606 ATAT 1609 

I I I I 

Db 421 ATAT 424 



RESULT 15 

T93792/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



T93792 336 bp mRNA linear EST 23-MAR-1995 

ye05f01.sl Scares fetal liver spleen INFLS Homo sapiens cDNA clone 
IMAGE: 116857 3*, mRNA sequence. 
T93792 

T93792.1 GI:726965 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 336) 

Hillier^L., Clark, N., Dubuque, T., Elliston,K., Hawkins, M., 
Holman,M., Hultman,M. , Kucaba,T., Le,M., Lennon,G., Marra,M 
Parsons, J., Rifkin,L., Rohlfing,T., Soares,M., Tan,F., 
Trevaskis,E. , Waterston, R. , Williamson, A. , Wohldmann,P. and 
Wilson, R. 

The WashU-Merck EST Project 
Unpublished (1995) 
Contact: Wilson RK 



Craniata; Vertebrata; Euteleostomi; 
Catarrhini; Hominidae; Homo. 



FEATURES 

source 



Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: est@watson.wustl.edu 

Insert Size: 768 

High quality sequence stops: 265 Source: IMAGE Consortium, LLNL 

This clone is available royalty- free through LLNL ; contact the 

IMAGE Consortium (info@image.llnl.gov) for further information. 

Insert Length: 768 Std Error: 0.00 

Seq primer: -21ml3 

High quality sequence stop: 265, 

Location/ Qualifiers 

1. .336 

/organism-"Homo sapiens" 
/mol_type="mRNA" 
/db_xref="GDB: 472474" 
/db_xref="taxon: 9606" 
/clone="IMAGE: 116857" 
/sex="male" 

/dev_stage="20 week-post conception fetus" 
/lab_host="DH10B (ampicillin resistant)" 
/clone_lib=="Soares fetal liver spleen INFLS" 
/note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) 
with a modified polylinker; Site_l: Pac I; Site_2 : Eco RI; 
1st strand cDNA was primed with a Pac I - oligo(dT) primer 
[ 5 • AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3 ' ] , 
double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Pac I and cloned into the Pac I 
and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization. Library 
constructed by Bento Scares and M.Fatima Bonaldo." 



ORIGIN 



Query Match 12.7%; 
Best Local Similarity 96.9%; 
Matches 312; Conservative 



Score 297; DB 14; 
Pred. No. 1.4e-59; 
0; Mismatches 8; 



Length 336; 



Indels 



2; Gaps 



1; 



Qy 



Db 



2014 CCTAGGAATAGTTGTTTTCAAAATAAGGG — ATCATCTCATTAGCAGGTAGTG7\AAGCCA 2 071 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

322 CCTAGGAANAGTTGNTTTCAA7VATAAGGGGATCATCCTCATTAGCAGGTAGTGAAAGCCA 263 



Qy 



Db 



2072 TGGCTGGGAA7UVTGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGAACGTCTGAAAT 2131 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
262 TGGCTGGG7\AAATGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGAACGTCTGAAAT 203 



Qy 2132 GAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAACCATTAAGACTC 2191 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 202 GAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAACCATTAAGACTC 143 

Qy 2192 CATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCT 2251 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 142 CATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGCAATGGAAGTGGTTTATAGTCCCT 83 

Qy 2252 TGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAG 2311 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 82 TGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGAGCGGACCCAAG 23 



Qy 2312 AATGTAAATAATATTCATAAAC 2333 

I I I I I I I I I I I I I I I I I I I I I 

Db 22 AATGTAAATAATATTCATAANC 1 

Search completed: February 26, 2004, 09:39 
Job time : 3959.5 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence: 

Scoring table: 



Searched: 



February 26, 2004, 00:40:23 ; Search time 6010.48 Seconds 

(without alignments) 
16874.299 Million cell updates/sec 

US-09-989-981A-5 
2340 

1 gtcaggtggagcaggcaggg aatattcataaacctatggg 2340 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

3470272 seqs, 21671516995 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



6940544 



Database 



GenEmbl : 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 



gb_ba : * 
gb_htg : * 
gb_in: * 
gb_om: * 
gb_o V : * 
gb_pat : * 
gb__ph : * 
gb_pl : * 
gb_pr : * 
gb_ro : * 
gb_sts : * 
gb_sy : * 
gb_un : * 
gb_vi : * 
em_ba : * 
em_f un : * 
em_hum : * 
em_in: * 
em_mu : * 
em_om : * 
em_or : * 
em_o V : * 
em_pat : * 
em__ph : * 
em_pl : * 
em_ro : * 
em sts:* 



0 Q 

c o 


em. 


un ; ^ 


9 Q 


em. 


vi \ * 




em 


nuy nuiu. 




em 


htg xnvr * 




em 


n L-9 ot-iie J- . 




em 


htg mus : * 




em 


htg pin \ * 




em 


htg rod; * 


Jo 


em 


htg inain; * 


37 


em 


htg vrt:* 


38 


em 


sy : * 


39 


em 


htgo hum:* 


40 


em 


htgo mus : * 


41 


em 


htgo other:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
AX320883 

LOCUS AX320883 2340 bp DNA linear PAT 14-DEC-2001 

DEFINITION Sequence 4 from Patent WO0179272. 
ACCESSION AX320883 

VERSION AX320883.1 GI:17902433 

KEYWORDS 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 

AUTHORS Tian.H., Schultz^J. and Shan.B. 

TITLE Sitosterolemia susceptibility gene (ssg) : compositions and methods 

of use 

JOURNAL Patent: WO 017 9272-A 4 25-OCT-2 001; 
Tularik Inc. (US) 
FEATURES Location/Qualifiers 
source 1, .2340 

/organism="Homo sapiens" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 9606" 

/note="human sitosterolemia gene (SSG)" 
CDS 107. .2062 

/note="unnamed protein product; human sitosterolemia 

susceptibility gene (SSG) protein" 

/ codon_start=l 

/protein_id="CAD19409. 1" 

/db_xref="GI : 17902434" 

/ db_x re f = " REMT REMBL : CADI 9409" 

/translation="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGN 
PGS FQKKVEAVMAELSLSHVADRLI GNYSLGGI STGERRRVS lAAQLLQDPKVMLFDE 
PTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 



FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LI L YS FI PALVI LGI WFKI RDHLI SR" 

ORIGIN 

Query Match 100.0%; Score 2340; DB 6; Length 2340; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2340; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGTyVGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

Qy 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 11 I I I I I I I I I I I I I I I I I I I I 
Db 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 12 0 

Qy 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 18 0 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

Qy 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 

Db 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 

Qy 241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

Qy 301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

Qy 361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 

Db 361 AGGAAGCTCAGGCTCCGGGAATVACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 42 0 

Qy 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I 
Db 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

Qy 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I M I I I 
Db 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 54 0 

Qy 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I 
Db 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

Qy 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

Qy 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 72 0 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 



Qy 



721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 



Db 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 



Qy 7 81 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I 
Db 781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

Qy 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGT^TGCTTGATTTCTTCAA 960 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 102 0 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
Db 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 108 0 

Qy 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 114 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 114 0 

Qy 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1200 

Qy 12 01 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAG7\AAT7\AGCTGGC 12 60 

I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
Db 12 01 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 12 60 

Qy 12 61 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 132 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

Qy 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

Qy 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 144 0 

I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 144 0 

Qy 14 41 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

Qy 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

Qy 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 



1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 



Qy 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

Qy 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

Qy 1741 ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAA7VATCATCAGTTATTT 1800 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1741 ATCTGGATTCCTCAGAAACATACAAG7\7\ATGCCCATTCCTTTTAAAATCATCAGTTATTT 1800 

Qy 1801 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 1860 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 

Db 1801 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 18 60 

Qy 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

Qy 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 198 0 

Qy 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGT^TAGTTGTTTTCTWVATAAG 204 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

Qy 2041 GGATCATCTCATTAGCAGGTAGTG7WVGCCATGGCTGGGAA7\ATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 

Db 2041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

Qy 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 2101 ACTGTGCATGACTGCTCTGAACGTCTGAT^ATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

Qy 2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 2161 GACATCTCAAGTCTTTTTVACCATTAAGACTCCATTTGTGCCTCTTGGATCCT^AGCAGGCC 2220 

Qy 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

Qy 2281 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I 

Db 2281 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 



RESULT 2 
AX685733 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 



AX685733 
Sequence 
AX685733 
AX685733, 



2340 bp 
5 from Patent WO02081691. 

1 GI:29371742 



DNA 



linear PAT 2 9-MAR-2 003 



KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 

Hobbs,H.H.^ Shan,B., Barnes, R 
Abcg5 and abcgS : compositions and methods of use 
Patent: WO 02081691-A 5 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 
(US) 



Craniata ; Vertebrata ; Euteleos tomi ; 
Catarrhini; Hominidae; Homo. 

and Tian,H. 



FEATURES 

source 



CDS 



Location/Qualifiers 
1. .2340 

/organism="Homo sapiens" 
/mol_type="unas signed DNA" 
/db_xref-"taxon: 9606" 
107. .2062 

/note="unnamed protein product; human ABCG5 (hABCGS) " 

/codon_start=l 

/protein_id="CAD86572. 1" 

/db_xref="GI : 29371743" 

/db_xref ="REMTREMBL : CADS 6572 " 

/translation="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGN 
PGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDE 
PTTGLDCMTANQIWLLVELTVRRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 
FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LILYS FI PALVI LGI WFKI RDHLI SR" 



ORIGIN 



Query Match 100.0%; Score 2340; 

Best Local Similarity 100.0%; Pred. No. 0; 
Matches 2340; Conservative 0; Mismatches 



DB 6; Length 2340; 
0; Indels 0; 



Gaps 



0; 



Qy 

Db 



GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

I I I I M I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I 
GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 



Qy 

Db 

Qy 

Db 

Qy 

Db 



61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 



Qy 



241 



CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 



Db 241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

Qy 301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

Qy 361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 361 AGGAAGCTCAGGCTCCGGGAATVACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

Qy 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

Qy 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 54 0 

Qy 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

Qy 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

Qy 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 72 0 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

Qy 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 

Qy 781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 7 81 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

Qy 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I 

Db 961 TGACTGCGGTTACCCTTGTCCTG7UVCATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

Qy 1081 AGAATCTGCCTACAAGATVATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 1140 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 


1081 


AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 


1140 


QY 


11^1 

IXH. J. 


raaararrTnAAAArnTTArrAATf^f^TTrCTTTrAAAArCAAAGATTCTCCTGGAGTTTT 


1200 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1141 


G7WVCACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 


1200 


QY 


1 Z U J. 


rTPTA A ArTr;r;r;Tr;TTrTrrTr;Af^r;Ar;Af^T(^ArAAGAAArTTnGTGAGAAATAAGCTGGC 


1260 






1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db ' 


1201 


CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 


1260 


Qy 




1 i 1 1 V^V^ X X ^_-/\o/\/\X ^ X v^xax vuo*^ X X X X X X X X X X X X x x x 


1320 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 




Db 


1261 


AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 


1320 


Qy 




rnr-r'r'T'rrr'a Ar-r AATrTrPTA A Af^r^r^THrTATrPAf^nArPPirGTAGGTCTCCTTTACCA 


1380 




1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1321 


GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 


1380 


Qy 


1 JO 1 


r^TT'rrrT'rrr'rrr'r'Arrr'r'r'T A TAP Af^f^PATnrTHAArnrTGTGAATCTGTTTCCCGTGCT 


1440 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 M 




Db 


1381 


GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 


1440 


Qy 


1/1/11 


rpPAPPnpPTr APPPAPPArr APArtTP Anf^APf^f^rrTrTArrAGAAGTGGCAGATGATGCT 


1500 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1441 


GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 


1500 


Qy 


loU 1 


r'r'r'r'rp 7\'T'r'r'nr''T'r'P APPTPPT'PPPPT'T'P ACPCTTPTTriPP APP ATt^ATTTTPAfiPAGTGT 


1560 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 




Db 


1501 


GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 


1560 


Qy 


1 t:^ 1 

lool 


r-mr-p.ipTvr'T^r'r'apPPT^ppppTT APATPPTi^AriPT'TPPPPf^ATTTf^f^ATAT'TTTTPTGCTGC 


1620 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 




Db 


1561 


GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 


1620 


Qy 


icon 

Ibz 1 


rpj^rp/-«mrp/^r'r'r'nnr'r'aPT^T A ATTPPTH A ATTTPT A APTPTTHTHPT APTT(^(^T ATPfrTPPA 


1680 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1621 


TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 


1680 


Qy 


1 Q 1 

Ibo 1 


A A n^TPPA A AT ATA PTP A APA(^T(^TA(^Tr;r;PTPTr;PTriTPPATTGPGGGGGTGCTTGTTGG 


1740 




1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1681 


AAATCC7WVTATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 


1740 


Qy 


11/11 
1/41 


ATPTPPATTPPTPArZA AAPATAPAAr;AAATnPPPATTPPTTTTAAAATCATCAGTTATTT 


1800 




1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1741 


ATCTGGATTCCTCAGAAACATAC7\AG7Wi^TGCCCATTCCTTTTAAAATCATCAGTTATTT 


1800 


Qy 


1801 


TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 


1860 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 




Db 


1801 


TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 


1860 


Qy 


1 fi (^1 

1 0 Dl 


PAPTTPTPPPAr:PTPAAATr;TTTPTfiTGAPAAPTAATPPAATGTGTGCCTTCACTCAAGG 

1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 


1920 


Db 


1861 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 ■> I ■> ■ 

CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 


1920 


Qy 


1921 


AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 


1980 




M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


1921 


AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 


1980 



Qy 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

I M I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2 040 

Qy 2 041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2 041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

Qy 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M I M 

Db 2101 ACTGTGCATGACTGCTCTGT^CGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

Qy 2161 GACATCTCTyVGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 M I I I I I I I 

Db 2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

Qy 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACTUiLCTTGCAGGGACATGTGGT 2280 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I M M I M I I I I I I M 

Db 2221 TTGAATGCAATGG7UVGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

Qy 2281 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

Db 22 81 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAT^ACCTATGGG 2340 



RESULT 3 
AF320293 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
J0URN7VL 



FEATURES 

source 



PRI 13-DEC-2000 



gene 



AF320293 2340 bp rtiRNA linear 

Homo sapiens ABCG5 (ABCG5) mRNA, complete cds . 
AF320293 

AF320293 .1 GI : 11692799 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 2340) 

Berge.K.E., Tian,H., Graf, G. A., Yu,L., Grishin, N . V. , Schultz.J., 
Kwiterovich, P . , Shan,B., Barnes, R. and Hobbs,H.H. 
Accumulation of Dietary Cholesterol in Sitosterolemia Caused by 
Mutations in Adjacent ABC Transporters 
Science (2001) In press 

2 (bases 1 to 2340) 

Berg,e,K.E., Tian,H., Graf, G. A., Yu,L., Grishin, N . V. , Schultz^J., 
Kwiterovich, P. , Shan,B., Barnes, R. and Hobbs,H.H. 
Direct Submission 

Submitted ( 09-NOV-2000 ) Molecular Genetics, University of Texas, 
Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., 
Dallas, TX 75390-9046, USA 

Location/Qualifiers 

1. .2340 

/organism="Homo sapiens" 
/mo l_t yp e= "mRNA " 
/db_xref="taxon: 9606" 
1. .2340 
/gene="ABCG5" 



CDS 107. .2062 

/gene="ABCG5" 

/note="ATP-binding cassette, subfamily G, member 5" 

/ codon_start=l 

/product="ABCG5" 

/protein_id="AAG40003. 1" 

/db_xref="GI : 11692800" 

/translation="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGN 
PGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDE 
PTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVXRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSW7VLLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 
FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LILYSFIPALVILGIWFKIRDHLISR" 

ORIGIN 

Query Match 100.0%; Score 2340; DB 9; Length 2340; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2340; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

. I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
61 GGGTCCGGCCACCAGAATVATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 18 0 
I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I M I I I M I 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 18 0 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I 
181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I 
241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I I I I I I I I I M I I M I I M I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
301 CAGGCAGATCCTCATVAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I M I I I I I I I 
421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



Qy 


481 


CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 


540 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 




Db 


481 


CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 


540 


Qy 


1 

04 ± 




600 


Db 


541 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 

CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 


600 


Qy 


dU 1 




660 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 




Db 


601 


GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 


660 


Qy 


ODl 




720 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


661 


TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 


720 


Qy 


fZi 




780 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


721 


CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 


780 


Qy 


781 


CATGACTGCTAAi UACjAJL 1 (ai L-Cji L-L-l L-U 1 IjIj i tjtjUi L.vjL,/\l:r(a/\/\UL>(ciA/\l HjjI 


R4 0 

O *± 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


781 


CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 


840 


Qy 


841 




900 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


841 


GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 


900 


Qy 


901 


CL- i CjAGL i i L.(j(jALjAIjL- 1 oAi i i 1 1 ol Lj(cFL-AU(j^^L./\oV^tjo/\/\/\i yjK^ i i 1 i 1 i v^-r^rt. 


960 




1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


901 


CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 


960 


Qy 


961 


TGAC T GCGGi iACuCl i L-Ui ijAAL-Ai 1 UAAAL-L^L-i ill (jAL-I i. i Al Ai Ij^jAV_.v^i 






1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


961 


TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 


1020 


Qy 


1021 


/^rpr^ 7\ rrpr*/^ A rp A r'rr' A AArr'AArr'AZiPPPPAAATAPA A A PPTP PA An A f^Af^TPP A AT AT 
Cj I CAG i UCjAi AL.UL.AAAoL-AAooAAU'aoLjAAAl Alj/\rt^ 1 ^^^v^jf-v^jj^i 


1 nfto 

J. \J O \jf 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1021 


GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 


1080 


Qy 


1 n o 1 
lUo i 


A r* A AT^r'fprT'r'T Ar* A ana a atp app A ATTTPTP AT A A AAPTTTf^AAnA AT ATTriAAAriAAT 


1140 




1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


1081 


AGAAT CT GC CTACAAGAAAT CAGCAATTTGT CATAAAACTTTGAAGAAT ATT GAAAGAAT 


1140 


Qy 


11/11 


/"A A AHA PPTP A A A APPTTAPP A ATPCTTPPTTTP AAA APPAAAnATTPTPPTf^f^AGITTTT 


1200 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 




Db 


1141 


GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 


1200 


Qy 


1 o n 1 


r'Tr'T' A A Ar'TPrr'Tr'TTr'TPPTP AP^^APAPT^^AP A A^^A A APTTf^HTf^AriA A AT A AHPTGGP 


1260 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1201 


CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAG7\AACTTGGTGAGAAATAAGCTGGC 


1260 


Qy 


1261 


AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 


1320 




1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 




Db 


1261 


AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 


1320 


Qy 


1321 


GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 


1380 



UD 


1 

±0^1. 


fzrf::f::r:Trrr;AAr^r AATnTnrTAAAGGGTGCTATrCAGGACCGCGTAGGTCTCCTTTAC 


1380 


Qy 


1381 


GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 


1440 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 




Db 


loo 1 


i^mmm^mpppppppAr'rrrr^T Ap APArif^rATnpTriAArf^rTfiTGAATrTGTTTrrCGTGCT 


1440 


Qy 


1441 


GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 


1500 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 /I /1 1 


r^rr* A PPTPT'P A Pr*P A PP A PPAP A rZTP A (^(^AP(^f^PPTPTAPP APt A A(^TnrirAriATriATGCT 


1500 


Qy 


1501 


GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 


1560 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 




Db 


1 R n 1 


r'r'r'r'T' TiT^r'r'TiPT^PPTiPPT^PPT'PPPPT^TP APP^^TT^^TT^^PP APP ATf^ATTTTPAnPAf^Tf^T 


1560 


Qy 


1561 


GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 


1620 




1 1 1 1 1 1 1 1 1 1 i 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 




UD 


iobi 


r'Tr'r*'mr'T'rT'ZiPPP'PPPPPTT AP ATPPT^^Anr^TTfiPPPnATTTf^f^AT ATTTTTPTGPTGP 


1620 


Qy 


1621 


TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 


1680 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


LbZ 1 


rp m rp rp r-r'r'r'r'r*r'P' a n'PT'fi AT'TPPTPaaT'TTPT A APTPTTf^Tf^PT APTTf^nT ATPnT'PP A 


1680 


Qy 


1681 


AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 


1740 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1681 


AAATCCAAAi Ai A(j1 LAAUAOi 1 ACj i (aoL. i Ui LjL-I vjI (^(>_.Ai 1 (jv^tj^j^jij(ai kjk^i i LjI i yjKj 


17 4 0 


Qy 


1741 


ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAATV^TCATCAGTTATTT 


1800 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 / 4 i 


A rp r* A rp TTT'Ti n z\ r* A 2\ n P AT A P A A P A A AT PP PP ATTPPTTTT A A A ATP ATP Af^TT ATTT 
Ai U i ooAl i L-L* 1 UAoArtAL^Al AL-AAoAArii ok^U^^Al 1 V^*^ 111 li-U-V/V-tl \^r\\Jl. lrA.1 i i 


1800 


Qy 


1801 


TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 


1860 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


Id Ui 


rpAr'ATTrr'TiAAA ATATTPP APTPAP ATTPTTPT AftlTP A ATrtA(^TTPTAPf^r^APTGAATTT 
i ALAi i UL-AAAAAi Ai i oL-Ao i tjAoAi 1 1 1 Ao i L^/\/\l x i Ljt\\^\^\3r\\^ i \a./-i/-vi i i 


1860 


Qy 


1861 


CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 


1920 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IJ M 1 1 1 1 1 1 1 1 1 1 1 1 




Db 




r* AnTTr'TrTT' APPTP A A ATPTTTPTf^TCAP A APT A ATPP AATfiTriTf^PPTTP APTPAAGG 


1920 


Qy 


1921 


AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 


1980 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 




Db 




A ATTr* A ATTP ATTP AP A A A APPTCPPP ACf^TCPA AP ATPT AGATTPAP AATGAAPTTTCT 


1980 


Qy 


1981 


GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 


2040 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 N 1 




Db 


1 y 0 X 


r* ATTTTPT ATTP ATTT ATTPP AfZPTPTTf^TPATPPTAGGAAT AGTTGTTTTPAAAATAAG 

ill 1 Ai 1 ^/t.1 1 1 /\1 1 ^k^/Wj^ 1 1 1 vj 1 v^-TVl XTTLVTO-rVf^x x x ^3 x x x x \^j~irw^± j^jr\'^ 


2040 


Qy 


2041 


GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGT^AGCTGCCG 


2100 




1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


2041 


GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 


2100 


Qy 


2101 


ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 


2160 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


2101 


ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 


2160 


Qy 


2161 


GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 


2220 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 



Db 



2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 



Qy 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 228 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 

Db 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

Qy 2281 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 234 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2281 TATTTGGAT^TTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 234 0 



RESULT 4 
AF312715 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 
FEATURES 

source 



gene 
CDS 



AF312715 2740 bp rtiRNA linear PRI 14-JUN-2001 

Homo sapiens sterolin (ABCG5) mRNA, complete cds , 

AF312715 

AF312715.2 GI: 14423628 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 2740) 

Lee,M.H., Lu,K., Hazard, S., Yu,H., Shulenin,S., Hidaka,H., 
Kojima^H., Allikmets , R. , Sakuma,N., Pegoraro,R., Srivastava, A. K. , 
Salen^G., Dean^M. and Patel,S.B. 

Identification of a gene, ABCG5, important in the regulation of 

dietary cholesterol absorption 

Nat. Genet. 27 (1), 79-83 (2001) 

20578753 

11138003 

2 (bases 1 to 2740) 

Lu,K., Lee,M.-H. and Patel,S.B. 
Direct Submission 

Submitted ( 12-OCT-2000 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB541, Charleston, SC 29403, USA 

On Jun 14, 2001 this sequence version replaced gi: 12382303. 
Location/Qualifiers 
1. .2740 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/ chromosome- "2 " 

/map="2p21; between D2S2294 and D2S2298" 

/tissue_type= "liver" 

1. .2740 

/gene="ABCG5" 

141. .2096 

/gene="ABCG5" 

/codon_start-l 

/product=" sterolin" 

/protein_id="AAG53099. 1" 

/db_xref="GI: 12382304" 

/translation-"MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGN 



PGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDE 
PTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQmiAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 
FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LI LYS FI PALVI LGI WFKI RDHLI SR" 

ORIGIN 



Query Match 100.0%; Score 2340; DB 9; Length 2740; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2340; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 


1 
1 




60 




1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 
1 1 1 M 1 1 1 1 M M 1 1 1 1 M M 1 1 M 1 1 1 1 1 1 M M M 1 1 M M M 1 I 1 1 1 1 1 M 1 1 1 t 1 t 




Db 


35 


GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTG7VAGCCACTCTGGGGA 


94 


Qy 


61 


GGoTCCC3CjULA.L-L.A(:T/\AAAi J. 1 oL-L-UAoL, ill VjV_.V^ 1 ol 1 (ao\^u/\± ^jVjvji i^x^ 


120 




1 1 1 1 1 t 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 ] 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1' 

1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M M 1 M M M M 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 




Db 


95 


GGGTCCGGCCACCAG7VAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 


154 


Qy 


IZ 1 


A rp r« rn rp m TV r« r* r* n n r" r* 7i r* r* T r* a T r" C r* T P T r* r* A a 1^ T A A A P A n A n r 


180 




1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 1 M 1 1 1 1 1 1 M 1 1 1 M 1 M 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 M M M M 1 1 M 1 1 1 1 1 1 i 1 1 




Db 


155 


ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 


214 


Qy 


lo i 


r-z^Ar-ppprpTPPTrpPAPPPPPPPrr ArtPPTP ArAriPrTf^f^GrATrrTrCATGCCTCCTA 


240 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 M M M 1 1 1 M 1 1 1 1 M 1 1 1 1 M i 1 1 1 1 1 1 1 1 1 M 1 i 1 1 M 1 1 1 1 1 1 M 1 ) 1 t 1 1 i 1 1 




Db 


215 


GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 


274 


Qy 




pzipppT^PAPPP APPf^pnTn AftippppTt^nTf^r; (^APATC AC ATCTTGCCGGCAGCAGTGGAC 


300 




1 1 t t 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 1 M 1 1 t 1 1 1 1 1 t 1 1 1 1 1 t It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


275 


CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 


334 


Ov 


301 


CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 


360 




1 1 1 1 1 1 M M M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 




Db 


335 


CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 


394 


Qy 


361 


AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 


420 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 M 1 1 1 1 1 




Db 


395 


AGGAAGCTCAGGCTCCGGGATWVCCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 


454 


Qy 


421 


CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 


480 




1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


455 


CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 


514 


Qy 


481 


CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 


540 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 




Db 


515 


CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 


574 


Qy 


541 


CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 


600 




1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


575 


CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 


634 


Qy 


601 


GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 


660 



I M I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 



Db 635 GTVAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 694 

Qy 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

Db 695 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 754 

Qy 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCTSlACCACAGGCCTGGACTG 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 

Db 755 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 814 

Qy 781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 

Db 815 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGT^CTGGCTCGCAGGAACCGAATTGT 874 

Qy 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I 

Db 875 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACA7\AATTGCCAT 934 

Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 

Db 935 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 994 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I 

Db 995 TGACTGCGGTTACCCTTGTCCTGAACATTCTW^CCCTTTTGACTTCTATATGGACCTGAC 1054 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGTW^CCTCCAAGAGAGTCCAGATGAT 108 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I 

Db 1055 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1114 

Qy 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAG7\AT 114 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1115 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 1174 

Qy 1141 GAAACACCTGAAAACGTTACC^ATGGTTCCTTTCAT^AACCAAAGATTCTCCTGGAGTTTT 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1175 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1234 

Qy 1201 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACTy^GAAACTTGGTGAGAAATAAGCTGGC 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 1235 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 1294 

Qy 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1295 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1354 

Qy 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I M I I I I I 
Db 1355 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1414 

Qy 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 14 40 

I I I M I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I 

Db 1415 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1474 

Qy 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I M I I I I M I 

Db 1475 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1534 



Qy 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I M I I I I I I I I I I 

Db 1535 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1594 

Qy 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 

Db 1595 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1654 

Qy 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

I I I I I I 11 I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1655 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1714 

Qy 1681 AAATCCTWVTATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 

Db 1715 AAATCCAAATATAGTC7\ACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1774 

Qy 1741 ATCTGGATTCCTCAGAAACATAC7\AGA7VATGCCCATTCCTTTTAAAATCATCAGTTATTT 1800 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1775 ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 1834 

Qy 18 01 TACATTCCATW^TATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 18 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1835 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 1894 

Qy 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I I M I 

Db 18 95 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1954 

Qy 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1955 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 2014 

Qy 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2015 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2 074 

Qy 2041 GGATCATCTCATTAGCAGGTAGTG7WVGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2 075 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2134 

Qy 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I M I I 

Db 2135 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2194 

Qy 2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 

Db 2195 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2254 

Qy 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 228 0 

I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 

Db 2255 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2314 

Qy 2281 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATA7\ACCTATGGG 2340 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 2315 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2374 
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artificial sequences. 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 42 04-APR-2002; 

THE DEPARTMENT OF HEALTH 7\ND HUMAN SERVICES (US) ; Patel, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/Qualifiers 

1. .2516 

/organism-" synthetic construct" 
/inol_type==" Unas signed DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 



ORIGIN 



Query Match 99.9%; Score 2338.4; DB 6; 

Best Local Similarity 100.0%; Pred. No. 0; 
Matches 2339; Conservative 0; Mismatches 1; 



Length 2516; 
Indels 0; Gaps 



Qy 

Db 



35 



0; 



GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 94 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



61 GGGTCCGGCCACCAGA7\AATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
95 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 154 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
155 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 214 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
215 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 274 

241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

I I I I I I I I I I I I I I I I M I I I I I M I I I I M I I M M I I I I I M I I M I I I I I I M I I I I 

275 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 334 

301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
335 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 394 



Qy 

Db 



361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

395 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 454 



Qy 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 455 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 514 

Qy 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 515 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 574 

Qy 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M 

Db 575 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 634 

Qy 601 GAAG7VAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 635 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 694 

Qy 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 695 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 754 

Qy 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 78 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 755 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCT^CCACAGGCCTGGACTG 814 

Qy 781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 84 0 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 815 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 874 

Qy 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 875 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 934 

Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 935 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 994 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 995 TGACTGCGGTTACCCTTGTCCTG7\ACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1054 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGTWVTAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1055 GTCAGTGGATACCCTU^AGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1114 

Qy 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 1140 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 1115 AGAATCTGCCTACAAGAAATCAGCJ^ATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 1174 

Qy 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1200 

I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I 
Db 1175 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1234 

Qy 12 01 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 12 60 

I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1235 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTTACAAGAAACTTGGTGAG7WVTAAGCTGGC 1294 



Qy 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1295 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1354 

Qy 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 138 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 1355 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1414 

Qy 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1415 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1474 

Qy 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1475 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1534 

Qy 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1535 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1594 

Qy 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1595 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1654 

Qy 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1655 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1714 

Qy 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1715 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1774 

Qy 1741 ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAA7UVTCATCAGTTATTT 1800 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I M I I I I I 
Db 1775 ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 1834 

Qy 1801 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 18 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 1835 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 18 94 

Qy 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 18 95 CACTTGTGGCAGCTCAAATGTTTCTGTGAC7\ACTAATCCAATGTGTGCCTTCACTCAAGG 1954 

Qy 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 198 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1955 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 2014 

Qy 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 204 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I 
Db 2015 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2074 

Qy 2041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2075 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2134 

Qy 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 



I I I I I I I I I I I I I I M M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

Db 2135 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2194 

Qy 2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 2195 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2254 

Qy 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2255 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2314 

Qy 2281 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2315 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2374 



RESULT 6 
AX456519 
LOCUS 

DEFINITION 
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ORGANISM 
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ORIGIN 



linear PAT 06-JUL-2002 



AX456519 1920 bp DNA 

Sequence 41 from Patent WO0227016. 
AX456519 

70C456519. 1 GI: 21715409 



synthetic construct 
synthetic construct 
artificial sequences. 
1 

Patel^S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 41 04-APR-2002; 

THE DEP7VRTMENT OF HEALTH AND HUMAN SERVICES (US) ; Patel^ 
Shailendra B. (US) ; Dean, Michael (US) 

Location/Qualifiers 

1. .1920 

/organism=" synthetic construct" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 



Query Match 82.1%; Score 1920; DB 6; 

Best Local Similarity 100.0%; Pred. No. 0; 
Matches 192 0; Conservative 0; Mismatches 0; 



Length 1920; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



143 ATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCC 202 

I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
1 ATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCC 60 



Qy 



Db 



203 CCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGG 262 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

61 CCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGG 120 



Qy 
Db 



263 CCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTC 322 

I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
121 CCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTC 180 



Qy 323 TCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAA 382 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 181 TCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAA 240 

Qy 383 ACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAG 442 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 241 ACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAG 300 

Qy 443 GTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTC 502 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 GTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTC 360 

Qy 503 CTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCG 562 

I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 361 CTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCG 420 

Qy 563 CTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATG 622 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

Db 421 CTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATG 480 

Qy 623 GCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGC 682 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 481 GCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGC 540 

Qy 683 ATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAG 742 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 ATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAG 600 

Qy 743 GTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTC 802 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I 
Db 601 GTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTC 660 

Qy 8 03 GTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCC 862 

I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 661 GTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCC 720 

Qy 863 CGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATT 922 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 721 CGTTCTGAGCTTTTTCAGCTCTTTGACATVAATTGCCATCCTGAGCTTCGGAGAGCTGATT 78 0 

Qy 923 TTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCT 982 

I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 7 81 TTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCT 84 0 

Qy 983 GAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAG 1042 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 841 GAACATTCA7\ACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAG 900 

Qy 1043 GAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCA 1102 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
Db 901 GAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCA 960 

Qy 1103 GCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCA 1162 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
Db 961 GCAATTTGTCATAAAACTTTGAAGT^TATTGAAAGAATGAAACACCTGAAAACGTTACCA 1020 



Qy 1163 ATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTG 1222 



Db 


1021 


ATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTG 


1080 


Qy 


1223 


AGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAG 


1282 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 




AGGAGAGT GACAAGAAACTT GGTGAGAAATAAGCT GGCAGT GATTACGCGTCTCCTTCAG 


1140 


Qy 


1283 


AATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTA 


1342 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1141 


AATrTr;ATr ATr^(^r;TTTr;TTrrTrrTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTA 


1200 


Qy 


1343 


AAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTAC 


1402 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


19 01 


A Ar::r;r;Tr:rTATrrAnr;Arrf^rr;TAf^r;TrTrrTTTACCAGTTTGTGGGCGCCACCCCGTAC 


1260 


Qy 


1403 


ACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAG 


1462 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


±Z D J. 


A r A r:r;r A T T n A A r f^r TCT G A AT CT GT TT CCC GT GCT GC GAGCT GT CAGC GAG CAGGAG 


1320 


Qy 


1463 


AGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTC 


1522 




1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 




Db 


1^91 


ArTr Ar:;r:;Arr;r;rrTPTArrA(^AAr;TGGrAGATGATGCTGGCCTATGCACTGCACGTCCTC 


1380 


Qy 


1523 


CCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTA 


1582 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 ft 1 

1 o o J. 


rrrTTPAf^r r;TTr;TTf^rrArrAT(^ATTTTrAGCAGTGTGTGCTACTGGACGCTGGGCTTA 


1440 


Qy 


1583 


CATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATT 


1642 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 




fATrrTrAr^r^TTnrrrf^ATTTf^nATATTTTTf'TGrTGC'TCTCTTGGCCCCCCACTTAATT 


1500 


Qy 


1643 


GGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGT 


1702 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 "^01 


ncT (^AATTTTTAAPTrTT HT f^PT A PTT GGT AT C GT C CAAAAT C CAAATAT AGT CAAC AGT 


1560 


Qy 


1703 


GTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATA 


1762 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 Sfil 

X J D X 


GTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATA 


1620 


Qy 


1763 


CAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCC7W\AATATTGCAGT 


1822 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 






1 fi? 1 

X D jI. X 


rAAr;AAATf;rrrATTrrT t tt aaaat cat c agt tat tt t acat t ccaaaaat at t gcagt 


1680 


QY 


1823 


GAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTT 


1882 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 




Db 


1 (^ftl 

X U O X 


rzAr;ATTrTTGTAGTrAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTT 


1740 


Qy 


1883 


TCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACC 


1942 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1741 


TCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACC 


1800 


Qy 


1943 


TGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCA 


2002 




1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1801 


TGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCA 


1860 


Qy 


2003 


GCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAG 


2062 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I 



Db 1861 GCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAG 1920 



RESULT 7 
AY195873 

LOCUS AY195873 2351 bp mRNA linear ROD Ol-JUN-2003 

DEFINITION Mus musculus strain PERA/Ei ATP-binding cassette sub-family G 

member^,5 (Abcg5) mRNA, complete cds . 
ACCESSION AY195873 

VERSION AY195873.1 GI: 31322257 

KEYWORDS 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 2351) 

AUTHORS Wittenburg,H. , Lyons, M. A., Li,R., Churchill, G, A. , Carey, M.C. and 
Paigen, B. 

TITLE Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

Susceptibility: Evidence from a Cross of PERA/Ei and l/Ln Inbred 
Mice 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 2351) 

AUTHORS Lyons, M. A., Wittenburg, H. , Walsh, K. A., Carey, M.C. and Paigen, B. 
TITLE Direct Submission 

JOURNAL Submitted ( ll-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 
FEATURES Location/Qualifiers 
source 1. .2351 

/organism="Mus musculus" 

/mo l_t yp e= "mRNA" 

/strain="PERA/Ei" 

/db_xref="taxon: 10090" 

/ chr omo s ome= " 17" 

/map="55 cM" 

/sex="male" 

/tissue__type="liver" 
gene 1. .2351 

/gene="Abcg5" 
CDS 139. .2097 

/gene="Abcg5" 

/note^"ATP-dependent canalicular cholesterol transporter; 
white subfamily" 
/ codon_start=l 

/product="ATP-binding cassette sub-family G member 5" 
/protein_id="AA045094 .1" 
/db_xref="GI : 31322258" 

/translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 
SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 
lYHKILENIERARYLKTLPTVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 
QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSA7VLL 
APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 



TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 
FLI LYGFI PALVI LGI VI FKVRDYLI S R" 

ORIGIN 

Query Match 60.3%; Score 1410.8; DB 10; Length 2351; 

Best Local Similarity 80.4%; Pred. No. 1.4e-296; 

Matches 1665; Conservative 0; Mismatches 402; Indels 3; Gaps 1; 

CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 8 4 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I 
CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 116 

CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III 
TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 176 

GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

III II I I I I II I I I M I I II I I M I I I II I I II I I I I I I I 
AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 236 

GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAG 261 

II I I I I I I I I I I I I I I I I II II I I I I I I I I I II I I I I I II I I I I 
AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGG 296 

GCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGT 321 

III I II I I I II I I I I III I I I I I I I I Mill I I I I I I I I I I M I I I I I I I I 
GCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGT 356 

CTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAA 381 

I I I I M I I I I I I I I I I II Mill I II I II II I I I II M I I I I I M I I I I I I I 
CTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAA 416 

AACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGA 441 

I I II II II I I II I I II I I I I II I M I I I I I II II I II I Mini I I I I II 

GACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGA 476 

GGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGT 501 

II I I I II II II III I I I M I I I I II II I II I I II I M M I I I II I II II II I 
GGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGT 536 

CCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGC 561 

II I I I I I M I I I I I I II I II II II I I I I I II I II I II M I III II I I I II 

CCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC 596 

GCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCAT 621 

I M II I I I I II I II I I II I I II II I I II M II II I I I II I II II 

GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCAT 656 

GGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGG 681 

I I II I I I II I I I I I I II I I I I I II I II I I II II I I I II III I II I II II 
GACAGAGCTGAGCCTGAGCCACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGG 716 

CATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAA 741 

II M I II II I M I I I I I II I II II I II I I I I II I I I II II I I I II II M 

AATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAA 776 



Qy 


25 


Db 


57 


Qy 


85 


Db 


117 


Qy 


145 


Db 


177 


Qy 


205 


Db 


237 


Qy 


262 


Db 


297 


Qy 


322 


Db 


357 


Qy 


382 


Db 


417 


Qy 


442 


Db 


477 


Qy 


502 


Db 


537 


Qy 


562 


Db 


597 


Qy 


622 


Db 


657 


Qy 


682 


Db 


717 



Qy 



742 GGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGT 801 



Db 


111 


QY 


802 


Db 


837 


Qy 


862 


Db 


897 


Qy 


922 


Db 


957 


Qy 


982 


Db 


1017 


Qy 


1042 


Db 


1077 


Qy 


1102 


Db 


1137 


Qy 


1162 


Db 


1197 


Qy 


1222 


Db 


1257 


Qy 


1282 


Db 


1317 


Qy 


1342 


Db 


1377 


Qy 


1402 


Db 


1437 


Qy 


1462 




1497 


Qy 


1522 


Db 


1557 


Qy 


1582 



I I I I I M II I I I I II I I I II I I I I I II I I I I I I I I I I I I I I I I I Mill I I I I I 

GGTCATGATGCTAGATGAGCCAACCACAGGACTGGACTGCATGACTGCAAATCAAATTGT 836 

CGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCC 8 61 

I I Ml III II I II I I II I II II II M I II II II II M M I I I I II I M I I 
CCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCC 8 96 

CCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGAT 921 

II II II II I I II II I III II I II I II I I I I II II I II I I II II II II I 
TCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGT 956 

TTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCC 981 

II I II I I II I I I I II III II II I I I I II I M II I I II I II II II M II II I 

GTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCC 1016 

TGT^CATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAA 1041 

II II II I M II M I I II I II I I I I II II I MM M II I II I II I II I I II I 
TGAACATTCCAATCCCTTTGATTTCTACATGGACTTGACATCAGTGGACACCCAAAGCAG 1076 

GGAACGGGAAATAGATiLACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATC 1101 

M II I II I II II I II I I II I I MM II I I II I II II II I II I II I II II 
AGAGCGGGAAATAGAAACGTACAAGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATC 1136 

AGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACC 1161 

I II I II I II II I I I II M II II II I II I II II II M II I I I II 

TGACATCTATCACAAAATTCTGGAGAACATTGAAAGAGCACGATACCTGAAAACCTTACC 1196 

AATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTTVAACTGGGTGTTCTCCT 1221 

I II M II I I II II M I I I II II I M II II I I II II II II II I II I I 
CACGGTTCCTTTCAAAACAAAAGATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCT 1256 

GAGGAGAGTGAC7\AGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCA 1281 

II I I II II II M II II I II II II II I II I I II II II II II I II II I II II I I 
GAGGCGAGTAACAAGAAACTTAATGAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCA 1316 

GAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCT 1341 

II I I II II II II II I I II II M I II II I I I II I II II II I II I III 
GAATCTGATCATGGGCCTCTTCCTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCT 1376 

AAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTA 1401 

II II II III I MM I II II II II III II II II I II II II II II 

AAAGGGCGCTGTGCAGGACCGCGTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATA 1436 

CACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGA 1461 

II I II II II II II I II II II II I I II I II I I I I II I II II II II II I II II II I 
CACCGGCATGCTCAATGCTGTGAATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGA 1496 

GAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCT 1521 

II I II M II II I II II II II II I II II II I I II I II II I I II I II M M I 
GAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCT 1556 

CCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTT 1581 

I II M II II I II I II II I I M II II II I II II II I I II II II I I II II II I 
CCCCTTCAGCGTCATCGCCACGGTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTT 1616 

ACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAAT 1641 
II II II I II II II II II M I II II II II II II II II I II II I II I II II II II 



Db 



1617 GTATCCTGAAGTTGCCAGATTTGGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAAT 1676 



Qy 1642 TGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAG 1701 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I M I I I 
Db 1677 TGGAGAATTTCTAACACTTGTGCTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAG 1736 

Qy 1702 TGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAT^CAT 1761 

I M M I I I I II I I I IN I III I I I I I I I I I II I I I I M I I I M I I I I I I 

Db 1737 TATAGTGGCTCTGCTCAGCATCTCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACAT 1796 

Qy 1762 AC7VAGAAATGCCCATTCCTTTTAA7\ATCATCAGTTATTTTACATTCCAAAAATATTGCAG 1821 

II I I I I I I I I II I I I M I II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I II I 

Db 1797 ACAAGAAATGCCCATTCCTTTAAA7\ATCCTGGGTTATTTTACATTCCAAAAATACTGTTG 1856 

Qy 1822 TGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGT 1881 

I M I I I I I I II I I I I I I I I I I I Mill I I I I I I I I I I I I I I I I I II II 
Db 1857 TGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACAC 1916 

Qy 1882 TTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAAC 1941 

I II M I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
Db 1917 CTCTATGCTAAATCACCCGATGTGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAAC 1976 

Qy 1942 CTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCC 2001 

I I I II I I I I I I I I I I I I I II I I I I I I I I I I II II M III I I I I I II 

Db 1977 CTGCCCAGGTGCTACATCCAGATTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCC 2036 

Qy 2002 AGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTA 2061 

I I I M I I I I I II I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2037 AGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATA 2096 

Qy 2062 GTGAAAGCCATGGCTGGGAAAATGGAAGTG 2091 

MM I I Mill I III 

Db 2097 GTTAAGATGACAGGCAGGAAAGGGTTAATG 2126 
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AX456524 2354 bp DNA 

Sequence 46 from Patent WO0227016. 
AX456524 

AX456524. 1 GI: 21715413 



synthetic construct 
synthetic construct 
artificial sequences. 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 46 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Pat el, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/ Qualifiers 

1. .2354 

/organism=" synthetic construct" 
/mol type="unassigned DNA" 
/db xref="taxon: 32630" 



ORIGIN 



/note="Primer" 



Query Match 60.2%; Score 1409.2; DB 6; Length 2354; 

Best Local Similarity 80.4%; Pred. No. 3.2e-296; 

Matches 1664; Conservative 0; Mismatches 403; Indels 3; Gaps 1; 

Qy 25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 8 4 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 57 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 116 

Qy 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III 

Db 117 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 176 

Qy 145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

III II I I I I M I I I I I I II I II II I I II I I I I I I I I I I I I 

Db 177 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 236 

Qy 2 05 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAG 261 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 237 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGG 2 96 

Qy 262 GCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGT 321 

III I I I I I I I I I I I I III I I I I I I I I I I I I I I I M I I I I I I I I M I I II I I 
Db 2 97 GCCTTGGTGGAACATCAT^TCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGT 356 

Qy 322 CTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAA 381 

I II I I I II I I I I I I M II I I I I I I M I I I I I I I I I I I I I I I II M I I I I I I I 

Db 357 CTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAA 416 

Qy 382 AACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGA 441 

I I I I I I I I I I II I I I I I I M I I I I I I I I I I I I Mill I I I I I I I I I II I I 

Db 417 GACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGA 476 

Qy 442 GGTGTATGTGTyVCGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGT 501 

II I I I I I I I I I III I I I I I M I I I I I I I I I I I M I M I I I I I M M I I I I I I 

Db 477 GGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGT 536 

Qy 502 CCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGC 561 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I III I I I I I II 

Db 537 CCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC 596 

Qy 562 GCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCAT 621 

I I I I I I I I I II Mill II I I II II I I I I II II I I I II II II I I I 

Db 597 GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAAC7\AGAAGGTAGAGGCAGTCAT 656 

Qy 622 GGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGG 681 

I I II I II I I II II II II II I II II I II I I I I II I II M III I M II I I I 
Db 657 GACAGAGCTGAGCCTGAGCCACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGG 716 

Qy 682 CATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAA 741 

II M II I II II I I I II I I II II I I M II I I I I I I II II II I I I I I II M 

Db 717 AATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAA 776 

Qy 742 GGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGT 801 

M M I M II I II II II II II II I I II I II II I I I I I II I I I II I II II I Mill 



Db 



777 GGTCATGATGCTAGATGAGCCAACCACAGGACTGGACTGCATGACTGCAAATCAAATTGT 836 



Qy 802 CGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCC 8 61 

I I III III II II I I I I I I I I II M I I II I I I I I II I I I I I I I I I I I I I II 

Db 837 CCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCC 896 

Qy 862 CCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGAT 921 

II I I I I I I I I II II I III I I I II II I I I I II I I I I I I I I I I I I I I II I 

Db 897 TCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGT 956 

Qy 922 TTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCC 981 

M I I I I I I I M I I I I III I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 957 GTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCC 1016 

Qy 982 TGAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAA 1041 

M I I I I I I I II II I I I M II II I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 1017 TGAACATTCCAATCCCTTTGATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAG 107 6 

Qy 1042 GGAACGGGAAATAGAAACCTCCTVAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATC 1101 

II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II I 

Db 1077 AGAGCGGGAAATAGAAACGTAC7UVGCGAGTACAGATGCTGGAATGTGCCTTCAAGG7VATC 1136 

Qy 1102 AGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACC 1161 

I M I I I I M I I I I I M II I I I I I I I I I I I I I I I M I I I I I I I I 

Db 1137 TGACATCTATCACAAAATTCTGGAGAACATTGAAAGAGCACGATACCTGAAAACCTTACC 1196 

Qy 1162 AATGGTTCCTTTCAAAACCA7\AGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCT 1221 

I I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I 

Db 1197 CACGGTTCCTTTCAAAACAAAAGATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCT 1256 

Qy 1222 GAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCA 1281 

MM II I I II I II I I M II I I I I I I I I I I I I I I I I I I I I II I I M I I I I I II 

Db 1257 GAGGCGAGTAACAAGAAACTTAATGAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCA 1316 

Qy 1282 GAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCT 1341 

I I I I I II I I I I I II I I II I II I I I I I II I I I I I II II I I I I II III 

Db 1317 GAATCTGATCATGGGCCTCTTCCTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCT 137 6 

Qy 1342 AAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTA 1401 

I I I I I I III I I I I II I I I I II II II II II III I I I I M I I II I I I I I II 
Db 1377 AAAGGGCGCTGTGCAGGACCGCGTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATA 1436 

Qy 1402 CACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGA 1461 

III I II I II I I II I I I I I I I I I II I I M I I I Mill I II I II I II I I II I I I II 
Db 1437 CACCGGCATGCTCAATGCTGTGAATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGA 1496 

Qy 1462 GAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCT 1521 

I M I I I I I I II I II II II II I M II I I I II I II I I II I I I M I I I I I I I I 

Db 1497 GAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCT 1556 

Qy 1522 CCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTT 1581 

I I I II M I II I I I I I II I I II I I I M M I I II II M M II I I I I I I II II I 

Db 1557 CCCCTTCAGCGTCATCGCCACGGTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTT 1616 

Qy 1582 ACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAAT 1641 

I I I I I I I I I I II I I II I I II I II I I I II I I I I I I II I II II I I II II I I II II 

Db 1617 GTATCCTGAAGTTGCCAGATTTGGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAAT 167 6 



Qy 1642 TGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAG 1701 

III I I I I I I I I M I I II I I I I I I I I I I I I I I I I I I I I I II I I II I I M I I II I 
Db 1677 TGGAGAATTTCTAACACTTGTGCTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAG 1736 

Qy 1702 TGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACAT 1761 

I I I I M I I II I I I I III I IN Mill I I I I I I II I I I M I I I I I I I I M 

Db 1737 TATAGTGGCTCTGCTCAGCATCTCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACAT 1796 

Qy 1762 ACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAG 1821 

I I I I I I I I I M I I M I I I I M II I I I I I I I I I I I I I I I I I I I I I I I II M II I 
Db 1797 ACAAGAAATGCCCATTCCTTTAAAAATCCTGGGTTATTTTACATTCCAAAAATACTGTTG 1856 

Qy 1822 TGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGT 1881 

I I I II I I I I II I I I I II I I I I I I I I I I I I I II I I I I I I I I II I I II II 
Db 1857 TGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACAC 1916 

Qy 1882 TTCTGTGACAACT7VATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGA7WVC 1941 

I I I I I I I I I I I I I I I I I I I M I I I M I I I II I I I I I I I I II M I 
Db 1917 CTCTATGCTAAATCACCCGATGTGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAAC 197 6 

Qy 1942 CTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCC 2 001 

I I I I I I I I II I I I M I I I II I I M I I I I I I M II II Ml I II I I II 

Db 1977 CTGCCCAGGTGCTACATCCAGATTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCC 2 036 

Qy 2002 AGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATTVAGGGATCATCTCATTAGCAGGTA 2 061 

I I I I I I I I I I I I I I I I I I I M I I I I I I IN I I I I I I I II I I I I I I I I II 
Db 2 037 AGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATA 2 096 

Qy 2062 GTGAAAGCCATGGCTGGGAAAATGGAAGTG 2091 

I I I I I I I I I I I I III 

Db 2 097 GTTAAGATGACAGGCAGGAAAGGGTTAATG 2126 
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AF312713 2354 bp mRNA linear ROD 16-MAY-2001 

Mus musculus sterolin (AbcgS) mRNA, complete cds . 

AF312713 

AF312713,2 GI : 14 091944 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 2354) 

Lee^M.H., Lu,K., Hazard, S., Yu,H., Shulenin,S., Hidaka,H., 
Kojima,H., Allikmets,R., Sakuma,N., Pegoraro,R., Srivastava, A. K. , 
Salen,G., Dean,M. and Patel,S.B. 
Identification of a gene, ABCG5, 
dietary cholesterol absorption 
Nat. Genet. 27 (1), 79-83 (2001) 
20578753 
11138003 

2 (bases 1 to 2354) 
Lu,K., Lee,M.-H. and Patel,S.B. 
Direct Submission 
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Submitted ( 12-OCT-2000) Division of Endocrinology, Diabetes and 

Medical Genetics, Medical University of South Carolina, 114 Doughty 

St, STB 541, Charleston, SC 29403, USA 

3 (bases 1 to 2354) 

Lu,K., Lee,M.-H. and Patel,S.B. 

Direct Submission 

Submitted ( 16-MAY-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

On May 16, 2001 this sequence version replaced gi: 12382299. 
Location/ Qualifiers 
1. .2354 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6" 

/db_xref="taxon: 10090" 

/ tissue_type="liver " 

1. .2354 

/gene="Abcg5" 

139. .2097 

/gene="Abcg5" 

/note="ABCG5" 

/ codons tart=l 

/product="sterolin" 

/protein_id="AAG53097 . 1" 

/db_xref-"GI : 12382300" 

/translation-"MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 
SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 
lYHKILENIERARYLKTLPTVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 
QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 
FLILYGFI PALVI LGI VI FKVRDYLI SR" 



ORIGIN 



Query Match 60.2%; Score 1409.2; DB 10; Length 2354; 

Best Local Similarity 80.4%; Pred. No. 3.2e-296; 

Matches 1664; Conservative 0; Mismatches 403; Indels 3; Gaps 



1; 



Qy 

Db 



25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 84 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

57 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 116 



Qy 

Db 

Qy 

Db 



85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

I I I I I I I I I I I I I I I I I I M M I I I I I I I I III II III 

117 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 17 6 

145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 2 04 

III II I I I I I I I II I I I I I I M I I I I II II I I I I II I I I I 
177 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 236 



Qy 2 05 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAG 261 

I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 237 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGG 2 96 

Qy 262 GCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGT 321 

I I I I I I I I I MINI III II I I I I II I II I I I I I I M I M I I I I I II II M 
Db 297 GCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGT 356 

Qy 322 CTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAA 381 

I I I I I I I I I I I I I I I I II Mill I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 357 CTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAA 416 

Qy 382 AACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGA 441 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 417 GACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGA 476 

Qy 442 GGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGT 501 

I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 477 GGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGT 536 

Qy 502 CCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGC 561 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I II 

Db 537 CCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC 596 

Qy 562 GCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCAT 621 

I I I I I I I I I II I I I I I II I I II II I I I II I I I I I I I I I I I I I I I 

Db 597 GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCAT 656 

Qy 622 GGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGG 681 

I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I III I II I I I I I 
Db 657 GACAGAGCTGAGCCTGAGCCACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGG 716 

Qy 682 CATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAA 741 

II I I I I I II M I I I I I I I I I II I M I I I I I I I I I I I II II I I I I I II II 

Db 717 AATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAA 776 

Qy 742 GGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGT 801 

I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 777 GGTCATGATGCTAGATGAGCCAACCACAGGACTGGACTGCATGACTGCA7VATCAAATTGT 836 

Qy 802 CGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCC 861 

I I III III II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 837 CCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCC 896 

Qy 862 CCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGAT 921 

II I I I I I I I I II II I III I I I I I I I I I I I I I I I I I I I I M I I I I I M I 
Db 897 TCGCTCTGAGCTCTTCCAACACTTCGACT^AAATTGCCATCCTGACTTACGGAGAGTTGGT 956 

Qy 922 TTTCTGTGGCACGCCAGCGGAT^TGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCC 981 

M I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 957 GTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCC 1016 

Qy 9 82 TGAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAA 1041 

I II I II I I I II II I I I I I II II I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 1017 TGAACATTCCAATCCCTTTGATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAG 107 6 



Qy 1042 GGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGA7UVTC 1101 



II I I I I II M I I I I I I I Mil I I I I II I I II I I II I I II II I I I I MM 

Db 1077 AGAGCGGG7\AATAGAAACGTACAAGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATC 1136 

Qy 1102 AGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACC 1161 

I I I I I II I I I I I II I I II I I I I I I I I I I I I I I I II I I I I II M 

Db 1137 TGACATCTATCACA7\AATTCTGGAGAACATTGAAAGAGCACGATACCTGAAAACCTTACC 1196 

Qy 1162 7VATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCT 1221 

I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I I I I I 
Db - 1197 CACGGTTCCTTTCAAAACAAAAGATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCT 1256 

Qy 1222 GAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCA 1281 

I I I I I II I II I I I I I M II I I I I I I I I I I I M I I I I I M M II I I I I I MM 

Db 1257 GAGGCGAGTAACAAGAAACTTAATGAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCA 1316 

Qy 1282 GAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCT 1341 

I II I II I I II I II I I I II II I I I I II I I I II I I II I II I I II I III 
Db 1317 GAATCTGATCATGGGCCTCTTCCTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCT 1376 

Qy 1342 AAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTA 1401 

M M II III I II I I I II II II II II M II III I II II II II M I I II II 
Db 1377 AAAGGGCGCTGTGCAGGACCGCGTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATA 1436 

Qy 14 02 CACAGGCATGCTG7VACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGA 14 61 

III II I II I M II M I M II I I I I I M II M MM! MM I II II I II I II I I I 

Db 1437 CACCGGCATGCTCAATGCTGTGAATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGA 1496 

Qy 1462 GAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCT 1521 

I II II I I I I I II I I II II II II M II II I I II I I I II II I II II II M I I 

Db 1497 GAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCT 1556 

Qy 1522 CCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTT 1581 

II M II I I II I I I I II I I I II M I I M II I I II I I I M II I II II I M I I I 

Db 1557 CCCCTTCAGCGTCATCGCCACGGTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTT 1616 

Qy 1582 ACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAAT 1641 

II II II I II I I I I II M I II II I I I I II I I I II M II I I II I II I I I M I I I I 
Db 1617 GTATCCTGAAGTTGCCAGATTTGGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAAT 1676 

Qy 1642 TGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA7\AATCCAAATATAGTCAACAG 17 01 

II I I M I I M M II I M I II I I II I II M I I II II II I II II II I I I II I I II 
Db 1677 TGGAGAATTTCTAACACTTGTGCTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAG 1736 

Qy 1702 TGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACAT 1761 

I I I II II I M I I I I III I II I I II II II I I I II I II II I I II I II II M 

Db 1737 TATAGTGGCTCTGCTCAGCATCTCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACAT 1796 

Qy 1762 ACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAG 1821 

M II M II I M I I M II II I I MUM I II M I I I I M I I II I II I II II M I 

Db 1797 ACAAGAAATGCCCATTCCTTTAAAAATCCTGGGTTATTTTACATTCCAA7WVTACTGTTG 1856 

Qy 1822 TGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGT 1881 

II II II II I II II M M I II I I II II I I I I II I II II II II II I II II 

Db 1857 TGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACAC 1916 

Qy 1882 TTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAAC 1941 

Mill II I I M I II I I II I I II I M II I I M M M I I I II I II I 



Db 



1917 CTCTATGCTAAATCACCCGATGTGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAAC 1976 



Qy 

Db 

Qy 

Db 

Qy 

Db 



1942 CTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCC 2001 

I I M I I M I I I I I I I I I I I I I I I M I I I I I II II M III II I I I II 

1977 CTGCCCAGGTGCTACATCCAGATTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCC 2036 

2002 AGCTCTTGTCATCCTAGGT^TAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTA 2061 

I I I I I I II I II I I I I I I I II M I I I I I III I I I I I I I II I I I I I I M II 

2 037 AGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATA 2096 

2062 GTGAAAGCCATGGCTGGGAAAATGGAAGTG 2091 

II II II II I I I I I I I 

2 097 GTTAAGATGACAGGCAGGAAAGGGTTAATG 2126 
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AY195872 2351 bp mRNA linear ROD Ol-JUN-2003 

Mus musculus strain l/LnJ ATP-binding cassette sub-family G member 
5 (AbcgS) mRNA, complete cds . 
AY195872 

AY195872. 1 GI: 31322255 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 2351) 

Wittenburg,H. , Lyons, M. A., Li,R., Churchill, G. A. , Carey, M.C. and 
Paigen,B. 

Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice 

Unpublished 

2 (bases 1 to 2351) 

Lyons, M. A., Wittenburg, H , , Walsh, K. A., Carey, M.C. and Paigen,B. 
Direct Submission 

Submitted ( ll-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 

Location/Qualifiers 

1. .2351 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="I/LnJ" 

/db_xref-"taxon: 10090" 

/ ch r omo s ome= "17" 

/map="55 cM" 

/sex="male" 

/ tissue_type="liver" 

1. .2351 

/gene="Abcg5" 

139. .2097 

/gene="Abcg5" 

/note="ATP-dependent canalicular cholesterol transporter; 
white subfamily" 
/ codon_start=l 

/product="ATP-binding cassette sub-family G member 5" 



/protein_id="AAO45093. 1" 
/db_xref="GI : 31322256" 

/translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 

SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 

RLRCTGTLEGDVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTT^LALCRS 

SADFYNKKVEAVMTELSLSHVADQVIGSYNFGGISSGERRRVSIAAQLLQDPKVm 

EPTTGLDCMTANQIVLLLAEU^RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 

TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFKESD 

lYHKILENIERARYLKTLPTVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 

QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQFVGATPYTGMLNAVNLFPMLRAVS 

DQESQDGLYHKWQMLLAYVLHALPFSIIATVIFSSVCYWTLGLYPEVARFGYFSAALL 

APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 

TFQKYCCEILWNEFYGLNFTCGESNTTMLNHPMCAITQGVEFIEKTCPGATSRFTAN 

FLI LYGFI PALVI LGI VI FKVRDYLI S R" 



ORIGIN 



Query Match 60.1%; Score 1406; DB 10; Length 2351; 

Best Local Similarity 80.3%; Pred. No. l,6e-295; 

Matches 1662; Conservative 0; Mismatches 405; Indels 3; Gaps 1; 

Qy 25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 84 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 57 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 116 

Qy 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III 
Db 111 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 176 

Qy 145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

III II I II I M I I I I I I M M I I I M M I I I I I I I I I I I I 

Db 177 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGC7UVGGTTCGGTCACGGGCAC 236 

Qy 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAG 261 

I I I I II I I I II I I II I I II I II I I I I I I I I I I I II II I I I II I I I 
Db 237 GGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGG 2 96 

Qy 2 62 GCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGT 321 

III I I II I I I I I I I I III II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 297 GCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGT 356 

Qy 322 CTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAA 381 

I I I I I I I II I I Mill II Mill I I II I M I I I I II I I I I II M I I I Mill 

Db 357 CTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAA 416 

Qy 382 AACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGA 441 

I II I I I I M I I II I I M I II M II I I I I I I II II II I II I I I I I II II I 

Db 417 GACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGTGCACTGGGACCCTGGAAGGGGA 476 

Qy 442 GGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGT 501 

I II I MINI III I I I I I I I II II I I I I I I II I I II I I I I II I I I II I II I 

Db 477 CGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGT 536 

Qy 502 CCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGC 561 

II II II M I I I I I I I II I I II II I I I I I I I I M II M M I III I II II M 

Db 537 CCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC 596 



Qy 



562 GCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCAT 621 



Db 


597 


Qy 


622 


Db 


657 


Qy 


682 


Db 


717 


Qy 


742 


Db 


111 


Qy 


802 


Db 


837 


Qy 


862 


Db 


897 


Qy 


922 


Db 


957 


Qy 


982 


Db 


1017 


Qy 


1042 


Db 


1077 


Qy 


1102 


Db 


1137 


Qy 


1162 


Db 


1197 


Qy 


1222 


Db 


1257 


Qy 


1282 


uD 


± O X / 


Qy 


1342 


Db 


1377 


Qy 


1402 



I 1 1 1 1 1 1 1 1 II I II 1 1 II I I II II I I 1 1 1 1 1 1 1 1 1 1 II I 1 1 1 1 



GGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGG 681 

I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I II II I I I 
GACAGAGCTGAGCCTGAGCCACGTGGCAGACCAAGTGATTGGCAGCTATAATTTTGGGGG 716 

CATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAA 741 

I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I II II I I I I I II II 

AATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAA 776 

GGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGT 801 

I I I II I I II I I I I I I II I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I 
GGTCATGATGCTAGATGAGCCAACCACAGGACTGGACTGCATGACTGCAAATCA?ATTGT 836 

CGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCC 861 

I I III III II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

CCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCC 896 

CCGTTCTGAGCTTTTTCAGCTCTTTGACA7\AATTGCCATCCTGAGCTTCGGAGAGCTGAT 92 1 

II I I I I I I I I II II I III I I I I I I I I I I I I I I I I I I I I I I I II I I II I 
TCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGT 956 

TTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCC 981 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I 
GTTCTGTGGCACCCCAGAGGAGATGCTTGGATTCTTCAATAACTGTGGTTACCCCTGTCC 1016 

TGAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAA 1041 

I I I I II I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I M I I II I I I I I I I 
TGAACATTCCAATCCCTTTGATTTCTACATGGACTTGACATCAGTGGACACCCAAAGCAG 107 6 

GGT^CGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATC 1101 

II I II I I I I I I I I I I I I MM MM I II II I I II I I I II II I I II I II I I 
AGAGCGGGAT^ATAGAAACATACAAGCGAGTACAGATGCTGGAATCTGCCTTCAAGGAATC 1136 

AGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACC 1161 

I I I I II I II II I II II I I II I I II I I I I I II II II I II I II II 

TGACATCTATCACAAAATTCTGGAGAACATTGAAAGAGCACGATACCTGAAAACCTTACC 1196 

AATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTA7\ACTGGGTGTTCTCCT 1221 

I I I II M I M II I I I II II II I II II I I I II I II I I II I I I II II 
CACGGTTCCTTTTAAAACAAAAGATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCT 1256 

GAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCA 1281 

II I I II I I II II I II I I II I II I II I I II I II I II I II I II II M I II II I I 
GAGGCGAGTAACAAGAAACTTAATGAGGAATTVAGCAGGCAGTGATTATGCGTCTCGTTCA 1316 

GAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCT 1341 

II II II II II II I II I II I II I II II I I I I II I II II II I II I III 
GAATCTGATCATGGGCCTCTTCCTCATTTTCTACCTTCTCCGAGTCCAGAACAACACGCT 1376 

ATVAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTA 1401 

I I II II II II I II II M I II I I II II II II II II I II II I II II II 

AAAGGGTGCTGTGCAGGACCGCGTGGGGCTGCTCTATCAGTTTGTGGGTGCCACCCCATA 14 36 



Ml II I II I II M II I I M II I I II II I II I I I II I I II I II I II II II II II I 



Db 



1437 CACCGGCATGCTCAATGCTGTGAATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGA 14 96 



Qy 


1462 


Db 


1497 


Qy 


1522 


Db 


1557 


Qy 


1582 


Db 


1617 


Qy 


1642 


Db 


1677 


Qy 


1702 


Db 


1737 


Qy 


1762 


Db 


1797 


Qy 


1822 


Db 


1857 


Qy 


1882 


Db 


1917 


Qy 


1942 


Db 


1977 


Qy 


2002 


Db 


2037 


Qy 


2062 


Db 


2097 



GAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCT 1521 

I I I I I I I I I I I I I I II II I I I I I I I I I I I I II I I I I II I I II II I I Ml 

GAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCTGCTCGCCTACGTGCTACACGCGCT 1556 

CCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTT 1581 

I II I II I II I I I I I I I I I I I I I I I I I I I I II I II I I II I I I I I I I I I I I I I 
CCCCTTCAGCATCATCGCCACGGTGATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTT 1616 

ACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAAT 1641 

I I I I I M M I I I I I I I I I I I M I I I I I I I I I I I II I I I I II I I I I I I II I I I I 
GTATCCTGAAGTTGCCAGATTTGGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAAT 1676 

TGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCC7\AAATCCAAATATAGTCAACAG 1701 

III I I I I I I I M I I II I I II I I I I I I I I M I III II I I I I I II I I I I I I 

TGGAGAATTTCTAACACTTGTGCTACTTGGTATAGTCCAAAACCCTAATATTGTCAACAG 1736 

TGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACAT 1761 

I I I I I I I I I I I M I III III I I I I I I I I II II I M I I I M I I I I I I I I 
TATAGTGGCTCTGCTCAGCATCTCTGGACTGCTTATTGGATCTGGATTTATCAGAAACAT 1796 

ACAAGTWVTGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCTWW^TATTGCAG 1821 

I I I I I I I II I II I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
ACAAGAAATGCCTATTCCTTTAAAAATCCTGGGTTATTTTACATTCCAAAAATACTGTTG 1856 

TGAGATTCTTGTAGTC7UVTGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGT 1881 

I I I I I I I M II I I I I II II I I I I I I I I I II I I I I I I II I I I I I II II 

TGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAACTTCACTTGTGGTGAATCCAACAC 1916 

TTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAAC 1941 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CACTATGCTAAATCACCCGATGTGCGCCATCACCC7\AGGGGTCGAGTTCATCGAGA7WVC 1976 

CTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCC 2001 

I I I I I I I I I I M I I I I I I I I I I I I I Mill II II II I I I II II I II 

CTGCCCAGGTGCTACATCCAGATTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCC 2 036 

AGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTA 2 061 

I I I I I I I I I II II I I I II I I I I I I I I I I M I I II I I I I I II I I I I II I I 
AGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATA 2 096 



I I I I I I I II I I I III 
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AX320881 
Sequence 2 
AX320881 
AX320881.1 



2258 bp 
from Patent WO0179272. 

GI:17902431 



DNA 



linear 



PAT 14-DEC-2001 



Mus mus cuius (house mouse) 
Mus mus cuius 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 

FEATURES 

source 



CDS 



Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 
1 

Tian,H., Schultz.J. and Shan^^B. 

Sitosterolemia susceptibility gene (ssg) : compositions and methods 
of use 

Patent: WO 0179272-A 2 25-OCT-2001; 
Tularik Inc. (US) 

Location/Qualifiers 

1. .2258 

/organism="Mus musculus" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 10090" 

/note="mouse sitosterolemia susceptibility gene (SSG) " 
47. .2005 

/note="unnamed protein product; mouse sitosterolemia 

susceptibility gene (SSG) protein" 

/ codon_start=l 

/protein_id="CAD19408 .1" 

/db_xref="GI : 17902432" 

/ db_x r e f = " REMT REMBL : CAD 19408" 

/translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 

SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 

RLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 

SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVm 

EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 

TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 

lYHKILENIERARYLKTLPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 

QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 

DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALL 

APHLIGEFLTLVLLGIVQNPNIVNSIVTVLLSISGLLIGSGFIRNIQEMPIPLKILGYF 

TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 

FLI LYGFI PALVI LGI VI FKVRDYLI S R" 



ORIGIN 



Query Match 59.6%; Score 1395.6; DB 6; Length 2258; 

Best Local Similarity 80.7%; Pred. No. 3e-293; 

Matches 1642; Conservative 0; Mismatches 389; Indels 3; Gaps 



1; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

1 GGGACAGGCCACTAG7WUVTTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCC 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 

I I I I I III II III III II I I I II I I M II I I I II I I I I I 
61 CTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCT 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGCATCCTCCATGCCTC 

I I I I I I I I II I I I I II I I I I I I I II I M II II I M I II 
121 GGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTC 

238 CTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTG 

I I I I I I I I I I I M I I I I II MM MUM I II II I III II I I MM MM 

181 CTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTG 
298 GACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCAT 

I I II M I I II II II M II II M II M II I M I Mill M II I II II II II II 

241 GGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCAT 



12 0 



60 



180 



120 



237 



180 



297 



240 



357 



300 



Qy 


358 


Db 


301 


Qy 


418 


Db 


361 


Qy 


478 


Db 


421 


Qy 


538 


Db 


481 


Qy 


598 


Db 


541 


Qy 


658 


Db 


601 


Qy 


718 


Db 


661 


Qy 


778 


Db 


721 


Qy 


838 


Db 


781 


Qy 


898 


Db 


841 


Qy 


958 


Db 


901 


Qy 


1018 


Db 


961 


Qy 


1078 


Db 


1021 


Qy 


1138 


Db 


1081 



CCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGG 417 

I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCG 360 

GCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCA 477 

I I I I I I M I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I II I I II 

GCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCA 42 0 

GTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGT 537 

I I II II II I I I I I I II II I I I I I I I I I M I II I I I I I I I M I I I I I I I I I I II 

GTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGT 480 

GCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTT 597 

I I I I I II I I I M I I I II I II I I I I II I I I I I I II M I I I I I I I I 

GCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTA 54 0 

CCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACT 657 

I I I II II M I I I I I I I I I I I I I I I I I I II I I I I M I I I I I II II I I I I I I 
CAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCACGTGGCGGACCAAAT 600 

GATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGC 717 

I I I I I I M III I II I I II I I I I I I II II I I I I I II I I II II I I I I I I I I 
GATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGC 660 

AGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGA 777 

I I I I I I II II I I I I I II I I I IN I M II I I I I I I II I I I I I I I I M Mill 
AGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCAACCACAGGACTGGA 720 

CTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGTVACCGAAT 837 

I II I I II I I I I I I I I I I I I I I I I I III III II I I M I I I I I I I I I I I I I I I 
CTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAAT 78 0 

TGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGC 897 

I I I I II I I M I I I I I I I I I I I II I I I I I I II II II I Ml II I II I II I I I 
TGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGC 84 0 

CATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTT 957 

M I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I III I I II I I I II I I I 
CATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTT 900 

CAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCT 1017 

II II I II I I I I I I I II M I I I II I I I I M I II M I II I I II II M II I I I 
CAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGATTTTTACATGGACTT 960 

GACGTCAGTGGATACCCA7iAGCAAGGAACGGGAAATAGAAACCTCC7\AGAGAGTCCAGAT 1077 

III I I II I I II II I M II II I II I I II I I M I I I I I I I I I I I I I II II I I I 
GACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTACAAGCGAGTACAGAT 1020 

GATAGAATCTGCCTACAAG7WVTCAGCAATTTGTCATAAAACTTTGAAG7\ATATTGAAAG 1137 

I I MM I II II I I II II I I I II I III II I I I II I II I I II M I II 
GCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTGGAGAACATTGAAAG 1080 

AATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGT 1197 

I I I I M M II I I I I II I I M I I I I II II II II II I II I I I M I II II I 

AGCACGATACCTGAAAACCTTACCCATGGTTCCTTTCA7y^C7\AAAGATCCTCCTGGGAT 1140 



Qy 1198 TTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCT 1257 

III M II I I I I I II M I I I I I I I I I I I I I I I I I I I MM M I II I I 
Db 1141 GTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTAATGAGGAATAAGCA 12 00 

Qy 1258 GGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGT 1317 

I II II M I II I II I II II I II I II I I I I I I II M M I I I I II M II M II I I 
Db 1201 GGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTCCTCATTTTCTACCT 1260 

Qy 1318 TCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTA 1377 

I II I I I II I MM I II I I I I I I I II I II II I II I I II II I I II I I 
Db 1261 TCTCCGCGTCCAGAACAACACGCTAAAGGGCGCTGTGCAGGACCGCGTGGGGCTGCTCTA 1320 

Qy 137 8 CCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTG7\ACGCTGTGAATCTGTTTCCCGT 1437 

Ml II I II I I I II I I II I Mill I I I I I I II II I II I I I II I II I II I I II I 

Db 1321 TCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTGAATCTGTTTCCCAT 1380 

Qy 1438 GCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGAT 1497 

I II I I I I I II I I II I I I II II I II I I I I II I I II I I II M II I II M II I M I 
Db 1381 GCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCT 1440 

Qy 1498 GCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAG 1557 

Ml Mill I II I I II I II II II II I II II I I I I II II I II II II II II I 
Db 1441 GCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACGGTCATTTTCAGCAG 1500 

Qy 1558 TGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGC 1617 

I II I I I II I I I I I I I I II I I I I I I II II I II I II II II I I I I M II I Mill 
Db 1501 TGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTTGGATATTTCTCTGC 1560 

Qy 1618 TGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGT 1677 

I II I II I M II I I I I II I II II I II II II II II I II I II I M II II I II I I I II 

Db 1561 TGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTGCTGCTTGGTATAGT 162 0 

Qy 167 8 CCAAAATCC7WVTATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGT 1737 

II I I M I I II II I I II II II I I I I II I II II II I I III I II I II II I I 

Db 1621 CCAAAACCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATCTCTGGGCTGCTTAT 1680 

Qy 1738 TGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTA 1797 

I I II II I I II I I I II II II M I I II II I I M I II II I I II II I I I I II I I I II I 
Db 1681 TGGATCTGGATTTATCAGAAACATACAAGAAATGCCCATTCCTTTAAAAATCCTGGGTTA 1740 

Qy 1798 TTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAA 1857 

I II II II II I M I I II II II I I II I M I II M I I II I II I II I Mill II II I 
Db 1741 TTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAA 1800 

Qy 1858 TTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCA 1917 

I I I M I II II I I I I I I I I I I I I I I I II I II I I II I II II II 

Db 1801 CTTCACTTGTGGTGGATCCAACACCTCTATGCTAAATCACCCGATGTGCGCCATCACCCA 1860 

Qy 1918 AGGAATTC7\ATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTT 1977 

III I II Mill I II I I II I II I II I I II I II II II I I II I II I I II II I 

Db 1861 AGGGGTCCAGTTCATCGAGAAAACCTGCCCAGGTGCTACATCCAGATTCACGGCAAACTT 1920 

Qy 1978 TCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAAT 2037 

II II M III I II II I I M I I II II II I I I II I II I I I II II I I Ml I 

Db 1921 CCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGT 1980 



Qy 



2038 AAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTG 2091 



Db 1981 CAGGGACTACCTGATTAGCAGATAGTTAAGATGACAGGCAGGAAAGGGTTAATG 2034 
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AF312714 2470 bp mRNA linear ROD 26-AUG-2002 

Rattus norvegicus sterolin (AbcgS) mRNA, complete cds . 

AF312714 

AF312714.3 GI: 22477143 

Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 2470) 

Lee,M.H., Lu,K,, Hazard, S., Yu,H., Shulenin,S., Hidaka,H., 
Kojima,H., Allikmets,R., Sakuma,N., Pegoraro,R., Srivastava, A. K. , 
Salen^G., Dean,M. and Patel,S.B. 

Identification of a gene, ABCGS, important in the regulation of 

dietary cholesterol absorption 

Nat. Genet,. 27 (1), 79-83 (2001) 

20578753 

11138003 

2 (bases 1 to 2470) 

Lu,K., Lee,M.-H. and Patel,S.B. 
Direct Submission 

Submitted ( 12-OCT-2000 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 

3 (bases 1 to 2470) 

Lu,K., Lee,M.-H. and Patel,S.B. 
Direct Submission 

Submitted ( 16-MAY-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

4 (bases 1 to 2470) 
Lu,K., Lee,M. and Patel,S.B. 
Direct Submission 

Submitted (26-AUG-2002) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

On Aug 26, 2002 this sequence version replaced gi: 14091945. 
Location/Qualifiers 
1. .2470 

/organism="Rattus norvegicus" 
/mol_type="mRNA" 
/strain="Sprague-Dawley" 
/db_xref="taxon: 10116" 
/tissue_type="liver" 
1. .2470 
/gene-"Abcg5" 
65. .2023 
/gene="Abcg5" 



/note="ABCG5" 
/codon_start=l 
/product="sterolin" 
/protein_id-"AAG53098 . 3" 
/db_xref="GI: 22477144" 

/translation="MSELPFLSPEGARGPHNNRGSQSSLEEGSVTGSE:ARHSLGVLNV 
SFSVSNRVGPWWNIKSCQQKWDRKILKDVSLYIESGQTMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFVNGCELRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLALRSS 
SADFYDKKVEAVLTELSLSHVADQMIGNYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANHIVLLLVELARRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQSD 
ICHKILENIERTRHLKTLPMVPFKTKNPPGMFCKLGVLLRRVTRNLMRNKQWIMRLV 
QNLIMGLFLIFYLLRVQNNMLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYQKWQMLLAYVLHALPFSIVATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGMVQNPNIVNSIVALLSISGLLIGSGFIRNIEEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSVPNNPMCSMTQGIQFIEKTCPGATSRFTTN 
FLILYSFIPTLVILGMWFKVRDYLISR" 



ORIGIN 



Query Match 59.1%; Score 1383.8; DB 10; Length 2470; 

Best Local Similarity 80.0%; Pred. No. l.'le-290; 

Matches 1641; Conservative 0; Mismatches 407; Indels 3; Gaps 1; 

Qy 44 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAT^ATTTGCCCAGCTTTGCTGCCTGTTG 103 

I I I I I I I I I I I I I I I I I I I I I III III I I I I I I I I I I I 
Db 2 TAAAGTTGCTCTGAAGCCAGACAGGACACCAGAGGATTCACTCACATTTGCTTCCCGCTG 61 

Qy 104 GCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGA 163 

I I I I I I I I II M I I I I M III II III III II MINI 

Db 62 GCCATGAGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACAACAACAGA 121 

Qy 164 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 220 

II II I I I I II I I II I II I I I I II I I I I I I I I I I I I I I I I I 

Db 122 GGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTCAGAGGCTCGGCACAGCTTA 181 

Qy 221 GGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACA 2 80 

II MM Ml II II II I M I I II I I I I I I I M II I I I I II I I I I I I I I 

Db 182 GGTGTCCTGAATGTGTCCTTCAGCGTCAGCAACCGTGTCGGGCCCTGGTGGAACATCAAA 241 

Qy 2 81 TCTTGCCGGCAGCAGTGGACCAGGCAGATCCTC7VAAGATGTCTCCTTGTACGTGGAGAGC 340 

II MM II I I Mill MM I II M II II I I II I II I II I I I II I I I I II I 

Db 242 TCATGCCAGCAGAAGTGGGACAGGAAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGT 301 

Qy 341 GGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCC 4 00 

II II II M I I I M I II II II I I II I II I I II I M II M M I I M II II II I II M 
Db 302 GGCCAGACCATGTGCATCTTAGGTAGCTCAGGCTCAGGGAAAACCACGCTGCTGGACGCC 361 

Qy 401 ATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGG 4 60 

II II I II II II I I I II II I M Mill I II I M I I II II M I 

Db 362 ATCTCTGGGAGGCTGCGGCGCACAGGGACCTTGGAAGGGGAAGTGTTTGTGAACGGCTGC 421 

Qy 4 61 GCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTG 520 

I II I I I II I I II I II II I I I II II I I I I II I II I M II I II II I I II I I 
Db 422 GAGCTGCGCAGGGACCAGTTCCAAGACTGCGTCTCCTACCTCCTGCAGAGCGATGTCTTT 481 



Qy 



52 1 CTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGC 580 

I II II II II II II I Mill I II I I M I I M I I I'l M II I I I II Mill II 



Db 



4 82 CTGAGCAGCCTCACGGTGCGGGAGACGCTGAGATACACGGCGATGCTGGCTCTCCGCAGC 541 



Qy 581 GGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGC 640 

II I I II II I I I I I I II I I I I I I I III II I I I I I II I I I II M I II 

Db 542 AGCTCCGCGGACTTCTACGACAAGAAGGTAGAGGCAGTCCTGACAGAGCTGAGTCTGAGC 601 

Qy 641 CATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGG 700 

II I I II I I I I I I I I II I I I I I I II I I II I I I I I I I I I I I I II I II I II 

Db 602 CACGTGGCAGACCAAATGATCGGCAACTATAATTTTGGGGGGATTTCCAGTGGCGAGCGG 661 

Qy 7 01 CGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAG 760 

I I I II II I I II I I I I M I I I I M II I I I I I II I I I I I II II II I I I I III 
Db 662 CGCCGAGTGTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTTGACGAG 721 

Qy 761 CCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTG 820 

I I I I I I I I I M I I I I I I I I I M I I I I M I I I M II III Mill I I I I II III 
Db 722 CCAACCACAGGACTGGACTGCATGACTGCAAATCATATCGTCCTCCTCTTGGTCGAGCTG 781 

Qy 821 GCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAG 880 

I I I I I I I I I I I I I I I II I I II I I I M I I I II I I II I II I I I I I I I I I I II 
Db 7 82 GCTCGCAGGAACCGCATTGTAATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAC 841 

Qy 8 81 CTCTTTGAC7V7WVTTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCG 940 

I Ml M II II II I I I I II I I II I I II I I M II I I II II I I I I II I II I I I 

Db 842 CACTTCGACAAAATTGCCATTCTGACTTACGGAGAGTTGGTGTTCTGTGGCACGCCAGAG 901 

Qy 941 G7WVTGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTT 1000 

II Mill I M I I I I I I I I II I I I I II M I I II I I II II I I I I I II II III 

Db 902 GAGATGCTCGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTT 961 

Qy 1001 GACTTCTATATGGACCTGACGTCAGTGGATACCCA7VAGCAAGGAACGGGAAATAGAAACC 1060 

II II I II I I II I I I I I I II I I I I I II I I II I II I II II II Mill II 
Db 962 GATTTCTACATGGACTTGACATCGGTGGACACCCAAAGCAGAGAGCGAGAGATAGAGACG 1021 

Qy 1061 TCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACT 112 0 

I I II I I II I II I II I I I II II II I I I I II I I II I I II I II II MM 

Db 1022 TACAAGCGAGTCCAGATGCTGGAATCTGCCTTCAGGCAATCGGACATCTGTCACAAAATC 1081 

Qy 1121 TTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACC 1180 

II MM II II I M II I I II I II I I II I I I II II I I I I II I I II I I M I II 
Db 1082 CTGGAG7\ACATTGAAAGAACAAGACACCTGAAAACCCTACCCATGGTTCCTTTCA7W^CG 1141 

Qy 1181 AAAGATTCTCCTGGAGTTTTCTCT7W^.CTGGGTGTTCTCCTGAGGAGAGTGACAAGAAAC 1240 

II I II I I II III I MM II II II II I II I I II M I I II I I II II II II 

Db 1142 TWVAATCCTCCCGGAATGTTCTGCAAGCTCGGCGTTCTCCTGAGGAGAGTTiLACGAGAAAC 1201 

Qy 1241 TTGGTGAGTVAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTG 1300 

I I I I I II I II II II I I I II II I II I II II II II I I I I II I I I M II I II 
Db 1202 CTAATGAGGAATAAGCAGGTGGTGATTATGCGTCTTGTTCAGAATCTGATCATGGGTCTG 1261 

Qy 1301 TTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGAC 1360 

M II M I I I I I I I II M I I II II MM M M I II II I I I I I M I I I 

Db 1262 TTCCTCATTTTCTACCTTCTCCGAGTCCAGAACAACATGCTGAAGGGCGCTGTTCAGGAC 1321 

Qy 1361 CGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCT 1420 

II I II II I II I II I II I II II M I I I I M I II I I I II I II M II II I II II I 

Db 1322 CGCGTAGGGCTGTTGTACCAGCTTGTGGGTGCCACCCCGTACACCGGCATGCTCAACGCT 1381 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1421 GTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTAC 1480 

I I I I I II I I I I I I II II I I II I I I II II M I I I II I I I I I I I I II I I I II I Ml 
1382 GTGAACCTCTTTCCCATGCTGAGAGCTGTCAGCGACCAGGAGAGTCAGGATGGCCTGTAC 1441 

1481 CAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCC 1540 

I I I I I I I I I I I I I M I I I I I I II I II I I II I I I I I I I I I I M I I I I I M M 
1442 CAGAAGTGGCAGATGCTGCTCGCCTATGTGCTGCATGCTCTCCCCTTCAGCATCGTTGCC 1501 

1541 ACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGA 1600 

II I I I I I I M I I I I I I I II I I I I I I I II I I I I II I I II II I I I I I III II 
1502 ACGGTGATTTTCAGCAGCGTGTGTTACTGGACTCTGGGCTTGTATCCCGAGGTCGCCAGA 1561 

1601 TTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTT 1660 

I I I II I I I II I I I II Mill II I II I M II I I II I II I I I I II I I M II III 
1562 TTTGGATACTTCTCTGCCGCTCTGTTGGCCCCTCACTTAATTGGAGAATTTCTGACACTT 1621 

1661 GTGCTACTTGGTATCGTCCAAAATCCAAATATAGTC7\ACAGTGTAGTGGCTCTGCTGTCC 1720 

I I M I M I II I I I I II I I I II II I I I II II I I II M I II M I M I I M I I 

1622 GTGCTGCTTGGTATGGTCCAAAACCCCAATATTGTCAACAGCATAGTGGCTCTGCTGAGT 1681 

1721 ATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCT 1780 

III I III I I I I II I I I I I II I II I II II I II I I II I I I I II I I I II II I II I 
1682 ATTTCTGGGTTGCTCATTGGATCTGGATTTATCAGAAACATAGAAGAAATGCCCATTCCT 1741 

1781 TTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAAT 1840 

II I I I I I I I MM I I I I I I I M II I I II II II I I II I II II I I I I I II I 
1742 TTAAAAATCCTGGGTTACTTTACCTTCCAAAAGTACTGTTGTGAGATTCTTGTGGTCAAT 1801 

1841 GAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCA 1900 

I I I I II I I II Mill I II I I I I II I I II II II I II II I I III III III 
1802 GAGTTCTATGGCCTG7iA.CTTCACTTGTGGTGGCTCCAACACTTCTGTGCCAAATAACCCA 1861 

1901 ATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCT 1960 

II I M I II I II I II I I II II M I I I I I II I II I II II II I I II I II M II 

1862 ATGTGTTCCATGACCCAAGGGATCCAATTCATTGAGAAAACCTGCCCAGGGGCCACGTCC 1921 

1961 AGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGA 2020 

I I I I I I I I I I II II II I I I I II I I II II II I I I II I I II M I II I M 
1922 AGATTCACGACAAACTTCCTGATCTTGTACTCGTTCATCCCGACTCTTGTCATCCTGGGG 1981 

2021 ATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGA 2080 

I I I I M I I I II I MM I II I II I II II M I II I III 
1982 ATGGTGGTCTTTAAAGTCCGGGACTACCTGATTAGCAGATAGGTAAGATGGCAGGCAGGA 2041 

2081 AAATGGAAGTG 2091 

II I III 
2042 AAGGGTTAATG 2052 
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AX685729 1959 bp 

Sequence 1 from Patent WO02081691. 
AX685729 

AX685729. 1 GI: 29371738 



DNA 



linear 



PAT 2 9 -MAR- 
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SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



CDS 



Craniata ; Vertebrata ; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus , 



Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 

Hobbs,H.H., Shan,B., Barnes, R. and Tian,H. 
AbcgS and abcgS : compositions and methods of use 
Patent: WO 02081691-A 1 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 
(US) 

Location/Qualifiers 
1. .1959 

/organism="Mus musculus" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 10090" 
1. .1959 

/note="unnamed protein product; ABCG5 (mABCG5) " 
/ codon_start=l 
/protein_id="CAD86570. 1" 
/db_xref-"GI: 29371739" 
/db_xref ="REMTREMBL : CADS 6570" 

/translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 
SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 
lYHKILENIERARYLKTLPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 
QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 
FLI L YGFI PALVI LGI VI FKVRDYLI S R" 



ORIGIN 



Query Match 58.4%; 
Best Local Similarity 81.4%; 
Matches 1595; Conservative 



Score 1365.4; DB 6; Length 1959; 
Pred. No. l.le-286; 
0; Mismatches 361; Indels 3; Gaps 



1; 



Qy 



Db 



107 ATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCT^AGTAAACAGAGGC 166 

I I I I I I I I I I I I I I I I III II III III II I I I I I I I I I 
1 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 60 



Qy 

Db 

Qy 

Db 

Qy 

Db 



167 TCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGC 

II I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 

224 ATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCT 

II I I I I I I I II I I I I I I I I I I I I MM II I II I II I II I M I M I III 
121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCA 

2 84 TGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGG 

MM II II Mill II II II II II II II II I I II II II M I II I I Mill II 

181 TGCCAGCAG7VAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGC 



223 



120 



283 



180 



343 



240 



Qy 



344 CAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATG 403 

Mill I II II II I I I I II M II II I II I I Mill II II M II II II I II II II r 



Db 



241 CAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATC 300 



Qy 404 TCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCG 463 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I III I I I 

Db 301 TCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAG 360 

Qy 464 CTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTG 523 

II II II I I I I I I I I I I II II I I I I I I II I II M II I I I I I II II II II I III 

Db 361 CTGCGCAGGGACCAGTTCC7y\GACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTG 420 

Qy 524 AGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGC 583 

I I I I I I I I I I I I I I I I I II I I I I III I I M I III I I I I I I I I II I I I I I II 

Db 421 AGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGC 480 

Qy 584 AATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCAT 643 

I I II II I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 TCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCAC 540 

Qy 644 GTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGC 703 

I I I I I I II I I I I I I I I I I I III I II I I I I I I I I I I I I II I I I I I I I I I 

Db 541 GTGGCGGACCTU^ATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGC 600 

Qy 704 CGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCA 763 

II II I I I I I I I I I I I I I I II II I I I I I II I I I I I I I I I II I I M I I I I I I 

Db 601 CGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCA 660 

Qy 764 ACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCT 823 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M M I III III II I I I I M 
Db 661 ACCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCT 720 

Qy 824 CGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTC 8 83 

I I I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I II I I I I I I I I II II I I 

Db 721 CGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACAC 780 

Qy 88 4 TTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAA 943 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I M I I I I I I I I III 

Db 781 TTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAG 840 

Qy 944 ATGCTTGATTTCTTC7\ATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGAC 1003 

I I I I I I I I I I I I M I I MM II I I II I I II II I I I I I II I II II II II I II 

Db 841 ATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGAT 900 

Qy 1004 TTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCC 1063 

II II I II II I I II I II I II I II I I I I I I I II I II II I M I I I II II I I I I 

Db 901 TTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTAC 960 

Qy 1064 AAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTG 1123 

II I I II I I II II I I I II I I M I I I I II I I I I I II I II I II I I I I I 

Db 961 AAGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTG 1020 

Qy 1124 AAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAA 1183 

I II I I I II I I I I I I II II II I I I I II I II II M I I I II I I II II I I III 

Db 1021 GAGAACATTGAAAGAGCACGATACCTGAAAACCTTACCCATGGTTCCTTTCAAAACAAAA 1080 

Qy 1184 GATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTG 1243 

III II I II I I I III II II I I I I I II II II I I II II II I II I I II II 

Db 1081 GATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTA 1140 



Qy 1244 GTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAG7\ATCTGATCATGGGTTTGTTC 1303 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I III 
Db 1141 ATGAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTC 1200 

Qy 1304 CTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGC 1363 

I I I I I I I I I I I I II I I I I II MM M I I I II II I I I I I I I I I I I II 

Db 1201 CTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCTAAAGGGCGCTGTGCAGGACCGC 12 60 

Qy 1364 GTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTG 1423 

II II II II II III I I I I I I I I M I M I I I I I II I I I I I I I I II I I I I II 

Db 1261 GTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTG 1320 

Qy 1424 AATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAG 14 83 

I I I II I I M I I I II I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II II 

Db 1321 AATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCAT 1380 

Qy 1484 AAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACC 1543 

I I I I I I II II I I I I I I I I I II I II I I I I I II M I I I I I I I I I I I I I I I I I 

Db 1381 AAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACG 1440 

Qy 1544 ATGATTTTCAGCAGTGTGTGCTACTGGAGGCTGGGCTTACATCCTGAGGTTGCCCGATTT 1603 

I I I I I I I I I I II I I II II II I II I I I I I I I I I I I I I I I I I I I I I I I Mill 
Db 1441 GTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTT 1500 

Qy 1604 GGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTG 1663 

II I I II II II II II I I II I II I I I I I I II I II II I II I II II I I I I M I I I II M 

Db 1501 GGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTG 1560 

Qy 1664 CTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATT 1723 

II II II I I I I I II II II I II Mill II II II II I II I M II I II I II III 
Db 1561 CTGCTTGGTATAGTCCAAAACCCT7VATATTGTCAACAGTATAGTGGCTCTGCTCAGCATC 1620 

Qy 1724 GCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTT 17 83 

I Ml II I I I I I I M M I M I M II II II M II I II I II I II I II II II I I II I 

Db 1621 TCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACATACAAGAAATGCCCATTCCTTTA 1680 

Qy 17 84 AAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAG 1843 

II II II I M II M I I II I II I I II I II I I II I II I I I II II II I II I II M I 
Db 1681 AAAATCCTGGGTTATTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCTyVTGAG 1740 

Qy 1844 TTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATG 1903 

II II I II I I I I I I II I II II I II I I I II Mill M I I M I I I 

Db 1741 TTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACACCTCTATGCTAAATCACCCGATG 1800 

Qy 1904 TGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGA 1963 

II III I I II II II I I II I II I I I I I II II I I M II II II I I I II I I I III 
Db 1801 TGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAACCTGCCCAGGTGCTACATCCAGA 1860 

Qy 1964 TTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATA 2 023 

Mill II II I II II II III II II I I I II II M M II I M II I II II I 

Db 1861 TTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAGGAATA 1920 

Qy 2 024 GTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAG 2 062 

II II M M I I II II I I II I II II I II I I I 
Db 1921 GTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATAG 1959 
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TITLE 

JOURNAL 



FEATURES 

source 



ORIGIN 



PAT 06~JUL-2002 



AX456526 2035 bp DNA linear 

Sequence 48 from Patent WO0227016. 

AX456526 

AX456526. 1 GI: 21715414 



synthetic construct 
synthetic construct 
artificial sequences . 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 48 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Pat el, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/ Qualifiers 

1. ,2035 

/ organism="synthetic construct" 
/inol__type="unassigned DNA" 
/db_xref="taxon: 32 630" 
/note-"Pirmer" 



Query Match 58.2%; 
Best Local Similarity 80.6%; 
Matches 1607; Conservative 



Score 1363; DB 6; Length 2035; 
Pred. No. 3.7e-286; 
0; Mismatches 385; Indels 3; 



Gaps 



1; 



Qy 



Db 



100 GTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAA 159 

I I I I I I I I I I I I I I I I I I I I I I III II III Ml II M 
1 GCTGGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACAACAA 60 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



160 CAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAG 216 

I I I I II I I I I I I I I I I II I I I I I I I II I I I I I I M I I Mill 
61 CAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTCAGAGGCTCGGCACAG 120 

217 CCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACAT 276 

I I II I I II Ml I I II II I M I I II I II II I I I II I I I I I I I I I M I 

121 CTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGCAACCGTGTCGGGCCCTGGTGGAACAT 18 0 

277 CACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGA 336 

II Ml II I I II I I Mill MM I II II II I II II II II II I I II II I I II 

181 CAAATCATGCCAGCAGAAGTGGGACAGGAAAATCCTCAAAGATGTCTCCTTGTACATCGA 24 0 

337 GAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGG7VAAACCACGCTGCTGGA 396 

III II II I I I II I II I I I I MM M I II I I II II II II I II II I M II II M I I 

241 GAGTGGCCAGACCATGTGCATCTTAGGTAGCTCAGGCTCAGGGAAAACCACGCTGCTGGA 300 

397 CGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGG 456 

II II M II II II II II I I II I I I II I II II I Mill I I I I I II I I II M 

301 CGCCATCTCTGGGAGGCTGCGGCGCACAGGGACCTTGGAAGGGGAAGTGTTTGTGAACGG 360 

4 57 CCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACAC 516 

I I I II I M I I II I I I I II II I I I I I M I II I M I M II I II I II M I M 

361 CTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCGTCTCCTACCTCCTGCAGAGCGATGT 420 



517 CCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCG 57 6 

I I I I M M I I I I I I M Mill I I I II I I I I I I II I III I I I I M I MM 

421 CTTTCTGAGCAGCCTCACGGTGCGGGAGACGCTGAGATACACGGCGATGCTGGCTCTCCG 480 

577 CCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCT 636 

I II M I I II II I I I I I II I I I I I I II III M I II I II I I I I I II 

481 CAGCAGCTCCGCGGACTTCTACGACAAGAAGGTAGAGGCAGTCCTGACAGAGCTGAGTCT 54 0 

637 GAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGA 696 

II II I I I I II II II I II I II 1 I I I II I I I I II I II MM 

541 GAGCCACGTGGCAGACC7WVTGATCGGCAACTATAATTTTGGGGGGATTTCCAGTGGCGA 600 

697 GCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCT7\AGGTCATGCTGTTTGA 756 

II I I I I M I M I II I I II I II I I I I M II I I I II II I I II I II M II MM 

601 GCGGCGCCGAGTGTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTTGA 660 

757 TGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGA 816 

I I I I I II I I II I I I I II I II I I I I II II I II I II I I M III II II I I II I II 

661 CGAGCCAACCACAGGACTGGACTGCATGACTGCAAATCATATCGTCCTCCTCTTGGTCGA 720 

817 ACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTT 87 6 

II I II I II I I II II I II II I I M I II I M I II I I II II I I I II I II 

721 GCTGGCTCGCAGGAACCGCATTGTAATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTT 780 

877 TCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCC 936 

II I III M I I M II I I II I I I I I I I II II I I I II I I II II II I II I I II 
7 81 CCACCACTTCGACAAAATTGCCATTCTGACTTACGGAGAGTTGGTGTTCTGTGGCACGCC 840 

937 AGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTG7UVCATTCA7\ACCC 996 

II III Mill I I II II I II I I I I I II II I I II I II I I I II II II I I II II 
841 AGAGGAGATGCTCGGCTTCTTCAAT7\ACTGTGGTTACCCCTGTCCTGAACATTCCAATCC 900 

997 TTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGA 1056 

I I II I I II II I I I I II II II II II I II M I II I I II I II II II II II I 

901 CTTTGATTTCTACATGGACTTGACATCGGTGGACACCCAAAGCAGAGAGCGAGAGATAGA 960 

1057 AACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAA 1116 

II I I I I I I II II I II I II I II II I I I II I II I MM I II I I I II II 

961 GACGTACAAGCGAGTCCAGATGCTGGAATCTGCCTTCAGGCAATCGGACATCTGTCACAA 1020 

1117 AACTTTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAA 1176 

II M I I II II I II II II I I II I I I II II M I I II I I I I II II I I I I M I 
1021 AATCCTGGAGAACATTGAAAGAACAAGACACCTGAAAACCCTACCCATGGTTCCTTTCAA 1080 

1177 AACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAG 1236 

III III II I I II III I II I I II II II II I I II I II I I I I I II I II II 
1081 AACGAAAAATCCTCCCGGAATGTTCTGCAAGCTCGGCGTTCTCCTGAGGAGAGTAACGAG 1140 

1237 AAACTTGGTGAG7WVTAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGG 1296 

II I I I II I I II I II I I II I I II I I I I I I II I I I II I I II I I II I M I I II 
1141 AAACCTAATGAGGAATAAGCAGGTGGTGATTATGCGTCTTGTTCAGAATCTGATCATGGG 1200 

1297 TTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCA 1356 

I I I I I M I I II II I I I II II II I I I I I III I I II Mill III I II 
1201 TCTGTTCCTCATTTTCTACCTTCTCCGAGTCCAGAACAACATGCTGAAGGGCGCTGTTCA 1260 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1357 GGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAA 1416 

I I I I I I I I M I I II I II II I I I I M I I I I I I I II II I I I II I I I II I I I I II 
1261 GGACCGCGTAGGGCTGTTGTACCAGCTTGTGGGTGCCACCCCGTACACCGGCATGCTCAA 1320 

1417 CGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCT 1476 

I I I I I II II II I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I II II I I I 

1321 CGCTGTGAACCTCTTTCCCATGCTGAGAGCTGTCAGCGACCAGGAGAGTCAGGATGGCCT 1380 

1477 CTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGT 1536 

I I I I I I II I I I I I M I I I I I I I I I I I I I I I I I I I I I II II I I II I II I II 
1381 GTACCAGAAGTGGCAGATGCTGCTCGCCTATGTGCTGCATGCTCTCCCCTTCAGCATCGT 1440 

1537 TGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGC 1596 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II 
1441 TGCCACGGTGATTTTCAGCAGCGTGTGTTACTGGACTCTGGGCTTGTATCCCGAGGTCGC 1500 

1597 CCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTT7\ATTGGTGAATTTCTAAC 1656 

I I II II I I II I II I I I I I I I I I I I I I M I I I I I I I II I I I I I I I I I I I I I II 
1501 CAGATTTGGATACTTCTCTGCCGCTCTGTTGGCCCCTCACTTAATTGGAGAATTTCTGAC 1560 

1657 TCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCT 1716 

I I I I I I I I I I I I I I M I I I I I II I II I I I I I I I I I I I I I I II I I I I I I I I I I 
1561 ACTTGTGCTGCTTGGTATGGTCC7W\ACCCCAATATTGTCAACAGCATAGTGGCTCTGCT 1620 

1717 GTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAT^CATACAAGAAATGCCCAT 1776 

I III I III I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 
1621 GAGTATTTCTGGGTTGCTCATTGGATCTGGATTTATCAGAAACATAGAAGAAATGCCCAT 1680 

1777 TCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGT 1836 

II I I I I I I M I I I I I I I I I I I I I I I II I I I II II I I I I II I I I I I I I II 
1681 TCCTTTAAAAATCCTGGGTTACTTTACCTTCCAAAAGTACTGTTGTGAGATTCTTGTGGT 1740 

1837 CAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCTWVTGTTTCTGTGACAACTAA 1896 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
1741 CAATGAGTTCTATGGCCTGAACTTCACTTGTGGTGGCTCCAACACTTCTGTGCCAAATAA 1800 

1897 TCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAAC 1956 

I I II I I I I I II I II Mill II I I I I I I II I I I I I I I I I I I I I I I I I I II M 
1801 CCCAATGTGTTCCATGACCCAAGGGATCCAATTCATTGAGAA7\ACCTGCCCAGGGGCCAC 1860 

1957 ATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCT 2016 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1861 GTCCAGATTCACGACAAACTTCCTGATCTTGTACTCGTTCATCCCGACTCTTGTCATCCT 1920 

2017 AGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCT 2 076 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
1921 GGGGATGGTGGTCTTTAAAGTCCGGGACTACCTGATTAGCAGATAGGTAAGATGGCAGGC 1980 

2077 GGGAAAATGGAAGTG 2091 

I I I I I I III 
1981 AGGAAAGGGTTAATG 1995 



RESULT 15 
AX456523 

LOCUS AX456523 1915 bp DNA linear PAT 06-JUL-2002 

DEFINITION Sequence 45 from Patent WO0227016. 



ACCESSION AX456523 

VERSION AX456523.1 GI:21715412 

KEYWORDS 

SOURCE synthetic construct 

ORGANISM synthetic construct 

artificial sequences. 
REFERENCE 1 

AUTHORS Patel,S.B. and Dean.M. 

TITLE Gene involved in dietary sterol absorption and excretion and uses 

therefor 

JOURNAL Patent: WO 0227016-A 45 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Patel, 
Shailendra B. (US) ; Dean, Michael (US) 
FEATURES Location/Qualifiers 
source 1. .1915 

/organism^" synthetic construct" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 

ORIGIN 

Query Match 57.1%; Score 1335.8; DB 6; Length 1915; 

Best Local Similarity 81.5%; Pred. No. 3.1e-280; 

Matches 1560; Conservative 0; Mismatches 352; Indels 3; Gaps 1; 

Qy 107 ATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGC 166 

I I I I I I I I I I I I I I I I III II III III II I I I I I I II I 

Db 1 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 60 

Qy 167 TCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGC 223 

I I I II I I I I I II I I M I I I I II I I I I I I I I I I I II I I I II 

Db 61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 120 

Qy 224 ATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCT 283 

MM MM I I M I I I II II II I I I II I M I II I I II I M II I II I III 

Db 121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCA 180 

Qy 284 TGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGG 343 

I II I I II I I I I II I I II II II I I I II II II I II I I I I I II II I I I I II I II 

Db 181 TGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGC 24 0 

Qy 344 CAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATG 403 

Mill I I II I I I II I I I I I M I I I I II I I I II II II M II I II II I II II M II 
Db 241 CAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATC 300 

Qy 404 TCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCG 463 

II I I I I I I II I I II I II I MUM I I II I I M I II I I II M III I I I 

Db 301 TCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAG 360 

Qy 4 64 CTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTG 523 

II I I I I I II I I M I I I I I I M I II I I I I II I I I II M I I I M II M II I Ml 

Db 361 CTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTG 420 

Qy 524 AGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGC 583 

II I M I M M I I I I I I I I I I II I III Mill Ml I I I II II I II II I I I II 

Db 421 AGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGC 48 0 



Qy 584 AATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCAT 643 

I I II II I I I I II I I I I I I I II I I I II I I I I I I I M I I I I I I I I I I 

Db 481 TCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCAC 540 

Qy 644 GTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGC 703 

I I I M II I I I I I I II I I M III I II Mill I I I M I I II I I I I I I I I I 
Db 541 GTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGC 600 

Qy 704 CGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCA 763 

M II II I I I I I II I I I I I II II I II I I II I I I I I II I I II I I I I II I I I I 
Db 601 CGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCA 660 

Qy 764 ACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCT 823 

I I I I I I M I I I I I I I I M M I I I I I I I I I I I I I I I I I III III II I I I II I 

Db 661 ACCACAGGACTGGACTGCATGACTGCAAATC7\AATTGTCCTTCTCTTGGCTGAGCTGGCT 720 

Qy 824 CGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTC 8 83 

MINI I I I I I I I I I II II I I I I I I I II I I I I II II I I I I I II I II II I I 

Db 721 CGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCC7\ACAC 780 

Qy 884 TTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAA 943 

II I I I I I I I I I I I I I I M I I I I I II I I I I II I M I I I I I I I II MM III 

Db 781 TTCGACATW^TTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAG 840 

Qy 944 ATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGAC 1003 

II I M I I M I II I II I II I I I I II I M I M I M I II II I I M II II Mill 

Db 841 ATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGAT 9 00 

Qy 1004 TTCTATATGGACCTGACGTCAGTGGATACCCAAAGC7UVGGAACGGGAAATAGAAACCTCC 1063 

II II I II II I II II I II II M I I M I II II II II I II II M II II I M I I 
Db 901 TTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTAC 960 

Qy 1064 AAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTG 1123 

IN I I II II I II I I I II I Mill MM MM I II I III II II I II 

Db 961 7UVGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTG 1020 

Qy 1124 AAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAA 1183 

I II I II I II II I I I I II II II II I II I II I II I I M I I M I II II III 

Db 1021 GAGAACATTGAAAGAGCACGATACCTGAAAACCTTACCCACGGTTCCTTTCAAAACAAAA 1080 

Qy 1184 GATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTG 1243 

I I I M II I II I I I I II I I II II I II M II II I II I I II II M I I II 
Db 1081 GATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTA 114 0 

Qy 1244 GTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTC 1303 

I I I I I I I I I I I II I I II I M I I II II I I I II M I II I II II I M II II I III 
Db 1141 ATGAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTC 1200 

Qy 1304 CTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGC 1363 

I I I I I I I I I I II II I I I II I I II I I II II I II I II I I II II I I II I 

Db 1201 CTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCT7\7\AGGGCGCTGTGCAGGACCGC 1260 

Qy 1364 GTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTG 1423 

II II II II II III I I I M I I II II II I I II I II I I I I II II II II M II 

Db 1261 GTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTG 1320 

Qy 142 4 AATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAG 1483 



Db 


1321 


Qy 


1484 


Db 


1381 


Qy 


1544 


Db 


1441 


Qy 


1604 


Db 


1501 


Qy 


1664 


Db 


1561 


Qy 


1724 


Db 


1621 


Qy 


1784 


Db 


1681 


Qy 


1844 


Db 


1741 


Qy 


1904 


Db 


1801 


Qy 


1964 


Db 


1861 



I I I I M I I I I I I Mill I I I I I I I I I I I II I I I II I II I II I I I INN II II 

AATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCAT 1380 

AAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACC 1543 

I I I I II I I I I I I II I I I I I I I I II I I I I I I I I I I I II I M I I I I I I I I I I 
AAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACG 1440 

ATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTT 1603 

I II I I II II I I I II I I I I II I I I I I I I I I I I I I I II I I II I I I I I I I I II I 
GTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTT 1500 

GGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTG 1663 

I II I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I II I I I I II I I I 
GGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTG 1560 

CTACTTGGTATCGTCC7WWVTCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATT 1723 

M I I I I I II I I I I I I I I I II Mill II I M I II I II I II II II I II I III 
CTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATC 1620 

GCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTT 1783 

I II I Mill II I I I I I II M II I M I II II I II I I I I I I I I II I II I I I II I I 
TCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACATAC7\AGAAATGCCCATTCCTTTA 1680 

AAAATCATCAGTTATTTTACATTCC7\AAAATATTGCAGTGAGATTCTTGTAGTCAATGAG 1843 

II M M 1 II I II M I I II I I I I I I II I II II II II I I II I I M II I II I I II 
AAAATCCTGGGTTATTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCAATGAG 1740 

TTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATG 1903 

II I I I II I I I I I I I I I I II I I I I I I I II Mill I I I I II II I 

TTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACACCTCTATGCTTU^ATCACCCGATG 1800 

TGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGA 1963 

II III II II I II II I II I I I I I II II II I I II I M M II II I II II I III 
TGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAACCTGCCCAGGTGCTACATCCAGA 1860 

TTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAG 2 018 

III II II II II III II I I I II II I M I II II I II I II 

TTCACGGC7\AACTTCCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAG 1915 



Search completed: February 26, 2004, 06:21:17 
Job time : 6017.48 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 

Run on: February 26, 2004, 00:39:18 ; Search time 592.658 Seconds 

(without alignments) 
16773.223 Million cell updates/sec 



Title: US-09-989-981A-5 
Perfect score: 2340 

Sequence: 1 gtcaggtggagcaggcaggg aatattcataaacctatggg 2340 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 3373863 seqs, 2124099041 residues 

Total number of hits satisfying chosen parameters: 6747726 
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RESULT 1 
AAD22009 

ID 7UVD22009 standard; DNA; 2340 BP. 
XX 

AC AAD22009; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) . 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 2p21; ds . 



XX 
OS 
XX 
FH 
FT 
FT 



Homo sapiens . 



Key 
CDS 



Location/Qualifiers 
107. .2062 
/*tag= a 

/product^ "Human SSG protein 



FT 



II 



XX 

PN WO200179272-A2, 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P ■ 

PR 15-MAY~2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR P-PSDB; AAE13290. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide^ 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 8; Fig 8; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is human SSG DNA. Human SSG is located on chromosome 

CC 2p21 

XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 

Query Match 100.0%; Score 2340; DB 6; Length 2340; 
Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2340; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 

Db 



1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I 
1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGT^GCCACTCTGGGGA 60 



61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 

61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I M I I I I I M I I I I I I I I I I I I I I I I 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I M I I I I I I 
241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I M I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I M M I 

301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I 
361 AGGAAGCTCAGGCTCCGGGAATyVCCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTG/sACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

M I I I I I I I I I I I I M I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 

421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I I I I I I I I I I I I I I M I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I 

541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M 
601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 72 0 

M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I M I M I I I I I I M I I I I I I I I I 

661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 78 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 78 0 

7 81 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 84 0 

I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I M I I 

781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I M I M I I I I I I I I I I I I I 
841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 



Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I 
Db 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

Qy 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 1140 

I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
Db 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 1140 

Qy 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAA7\ACCAAAGATTCTCCTGGAGTTTT 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1141 G7W^.CACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1200 

Qy 12 01 CTCTA7\ACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 12 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 12 01 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 12 60 

Qy 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

Qy 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

Qy 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1440 

I I I I I I I I I I I I I I I I M I I I I I M I I I I I I M I I I I I I I 11 I I I I I I I I I I I I I I I I I I 
Db 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 14 40 

Qy 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 n I I I I I M I I I I I I I I I I I I I I I I I I I 

Db 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

Qy 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I 
Db 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

Qy 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 

I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 

Qy 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

I I I I I I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

Qy 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I 11 I I I I I I I I I I I I I 
Db 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 



Qy 



1741 ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 1800 



Db 



1741 



1800 



Qy 1801 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 1860 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 18 01 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 1860 

Qy 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

Qy 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

Qy 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2 04 0 

Qy 2041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I M I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

Qy 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I 
Db 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

Qy 2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
Db 2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

Qy 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

Qy 22 81 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 2281 TATTTGGTWVTTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 



RESULT 2 




3882 


ID 


AAD48882 standard; DMA; 2340 BP. 


XX 




AC 


AAD48882; 


XX 




DT 


24-MAR-2003 (first entry) 


XX 




DE 


Human ABCG5 DNA. 


XX 




KW 


ABC family cholesterol transporter; ABCG8; sterol-related disorder; 


KW 


sitosterolaemia; hyperlipidaemia ; hypercholesterolaemia; gall stone; 


KW 


HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 


KW 


human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 


KW 


ABCG5; gene; ds . 


XX 




OS 


Homo sapiens. 



XX 

FH Key Location/Qualifiers 

FT CDS 107. .2062 

FT /*tag= a 

FT /product^ "hABCGS protein" 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-02522 35P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31704. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 11; Page 77; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides, ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG5 DNA 
XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 

Query Match 100.0%; Score 2340; DB 7; Length 2340; 
Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2 340; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCC7\ACTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCT^ACTGAAGCCACTCTGGGGA 60 

Qy 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

Db 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 12 0 

Qy 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M M I I I I I I I I I I 

Db 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGT7WVCAGAGGCTCCCAGAGCTCCCT 18 0 



Qy 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 

Qy 241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I M I I I I I I I I I I I I 

Db 241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

Qy 301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 

Db 301 CAGGCAGATCCTCA/VAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

Qy 361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 

Db 361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

Qy 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 48 0 

I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

Qy 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

Qy 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I 
Db 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

Qy 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I 
Db 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

Qy 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

I I I I I I I I I I M I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

Qy 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I 
Db 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCC7\ACCACAGGCCTGGACTG 78 0 

Qy 781 CATGACTGCT7\ATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 84 0 

I I I M I I M I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 84 0 

Qy 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAA7UVTTGCCAT 900 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I M I I I I M I I I I 
Db 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I M 
Db 961 TGACTGCGGTTACCCTTGTCCTGAACATTCA7\ACCCTTTTGACTTCTATATGGACCTGAC 1020 



Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

0 



Db 


1021 


GTCAGTGGATACCCAAAGCAAGGAACGGGTVAATAGAAACCTCCAAGAGAGTCCAGATGAT 


1080 


Qy 


1081 


AGAATCTGCCTACAAGA7\ATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 


1140 




t 1 1 1 1 1 1 1 1 1 ) 1 1 1 1 1 1 1 1 1 1 ) 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 t 1 1 1 I 1 1 1 1 i 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 M 1 M 




Db 


1081 


AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 


1140 


Qy 


1141 


GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 

1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 


1200 


Db 


1141 


M 1 M 1 1 M 1 1 M 1 M M 1 1 1 1 1 M 1 M M 1 1 1 1 1 M It M M M M M 1 M 
GAAACACCTGAATVACGTTACCAATGGTTCCTTTCAAAACCATKAGATTCTCCTGGAGTTTT 


1200 


Qy 


1201 


CT CTAAACT GGGT GT T CT C CT GAGGAGAGT GACAAGA7\ACTT GGT GAGAAAT AAGCT GGC 


1260 




1 1 1 1 1 1 i 1 1 1 1 1 I 1 1 1 r 1 1 1 I 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 

1 1 1 M M M 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 M 1 1 1 It 1 1 M 1 1 1 1 1 M M 1 M 




Db 


1201 


CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 


1260 


Qy 


1261 


AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 


1320 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 t 1 t 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 t t t t t 1 i 1 1 t t t 1 1 t t t 1 1 1 t t t 1 1 1 




Db 


1261 


AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 


1320 


Qy 


1321 


GCGGGTCCGAAGCT^TGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 


1380 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 

t 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 M N 1 1 1 i N t 1 1 t i 1 1 t 1 1 t 1 t t 1 t 1 t 1 1 




Db 


1321 


GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 


1380 


Qy 


1381 


GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 


1440 




1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 < 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 

t 1 1 t t t t 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 M 1 M 1 1 i 1 1 1 1 1 11 t 1 1 1 t t 1 1 1 1 t t 1 




Db 


1381 


GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 


1440 


Qy 


1441 


GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 


1500 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 t 1 t 1 1 1 1 1 1 1 1 1 1 1 

1 M It II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 It 1 1 t 1 II 1 1 1 1 it 1 1 1 1 1 1 t 1 t 1 1 t 1 t t 1 1 1 




Db 


1441 


GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 


1500 


Qy 


1501 


GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 


1560 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 t 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 t 1 t 1 1 1 1 1 t 1 t 1 i 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1501 


GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 


1560 


Qy 


1561 


GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 


1620 




1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 [ 1 1 1 
1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 II t t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 t 1 1 1 




Db 


1561 


GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 


1620 


Qy 


1621 


TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 


1680 




1 I 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 t 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 1 t 1 1 1 1 1 1 1 1 1 1 M 1 M 1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 




Db 


1621 


TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 


1680 


Qy 


1681 


AAATCCTWVTATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 


1740 




1 1 t r 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 i 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 i II 1 1 t 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 I 1 1 




Db 


1681 


AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 


1740 


Qy 


1741 


ATCTGGATTCCTCAGAAACATAC7\AGAAATGCCCATTCCTTTTA7\AATCATCAGTTATTT 


1800 




1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 1 t t 1 1 1 M 1 1 1 1 1 1 M i t 1 1 1 1 1 1 1 1 t t 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 




Db 


1741 


ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 


1800 


Qy 


1801 


TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 


1860 




1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 It 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


1801 


TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 


1860 


Qy 


1861 


CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 


1920 



I I I I I M I i I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 



Db 



18 61 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 



Qy 


1921 


Db 


1921 


Qy 


1981 


Db 


1981 


Qy 


2041 


Db 


2041 


Qy 


2101 


Db 


2101 


Qy 


2161 


Db 


2161 


Qy 


2221 


Db 


2221 


Qy 


2281 


Db 


2281 



AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I 
AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I M I 

ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCC7\AGCAGGCC 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTTVAATAATATTCATAAACCTATGGG 2340 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I 
TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 



RESULT 3 


ABK51682 


ID 


ABK51682 standard; cDNA; 2516 BP. 


XX 




AC 


ABK51682; 


XX 




DT 


30-JUL-2002 (first entry) 


XX 




DE 


Human ABCG5 cDNA sequence. 


XX 




KW 


Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia ; cholesterol; 


KW 


arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 


KW 


chromosome 2p21; ss. 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200227016-A2. 


XX 




PD 


04-APR-2002. 


XX 




PF 


25-SEP-2001; 2001WO~US029859 . 


XX 




PR 


25-SEP-2000; 2000US-0235268P . 


XX 




PA 


(USSH ) US DEPT HEALTH & HUMAN SERVICES. 


PA 


(PATE/) PATEL S B. 



PA (DE7\N/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 37-38; 66pp; English, 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the cDNA sequence of human ABCG5 gene located on 

CC chromosome 2p21 

XX 

SQ Sequence 2516 BP; 601 A; 631 C; 636 G; 648 T; 0 U; 0 Other; 

Query Match 99.9%; Score 2338.4; DB 6; Length 2516; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2339; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I M I I I 

GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 94 

GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I 
GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 154 

ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I M I I I I 

ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 214 

GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I 

GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 274 



Qy 


1 


Db 


35 


Qy 


61 


Db 


95 


Qy 


121 


Db 


155 


Qy 


181 


Db 


215 



Qy 



241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 

Db 275 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 334 

Qy 301 CAGGCAGATCCTC7y\AGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 
Db 335 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 394 

Qy 3 61 AGGAAGCTCAGGCTCCGGGAA7\ACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 42 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 395 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 454 

Qy 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 4 80 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 55 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 514 

Qy 4 81 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 515 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 574 

Qy 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 575 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 634 

Qy 601 GAAG7\AGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I 
Db 635 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 694 

Qy 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 695 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 754 

Qy 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 755 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 814 

Qy 7 81 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGG7VACCGAATTGT 840 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 815 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 874 

Qy 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M M I I M I I I M I M 

Db 875 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 934 

Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I M I I 
Db 935 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 994 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
Db 995 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1054 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I 
Db 1055 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1114 



Qy 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 114 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 



Db 


1115 


Qy 


1141 


Db 


1175 


Qy 


IZUl 


Db 


1235 


Qy 


1261 


Db 


1295 


Qy 


1321 


Db 


1355 


Qy 


1381 


Db 


1415 


Qy 


1441 


Db 


1475 


Qy 


1501 


Db 


1535 


Qy 


1561 


Db 


1595 


Qy 


1621 


Db 


1655 


Qy 


1681 


Db 


1715 


Qy 


1/41 


Db 


1775 


Qy 


18 01 


Db 


1835 


Qy 


lo ol 


Db 


1895 


Qy 


1921 


Db 


1955 



1115 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATA7WVCTTTGAAGAATATTGAAAGAAT, 1174 
GAAACACCTGATWVCGTTACCAATGGTTCCTTTCAAAACCjQAAGATTCTCCTGGAGTTTT 1200 

I I I M I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I M I 

GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1234 

CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 1260 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
CTCTAAACTGGGTGTTCTCCTGAGGAGAGTTACAAGAAACTTGGTGAGAAATAAGCTGGC 1294 

AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1354 

GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I M I I I 

GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1414 

GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I 

GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1474 

GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I 
GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1534 

GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I 
GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1594 

GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I M I M I I M I I I 

GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1654 

TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1714 

AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 17 4 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1774 

ATCTGGATTCCTCAGT^CATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 18 00 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 1834 

TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 18 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I M I I I I I I I I I I I I 
TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 1894 

CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M M 

CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTT^TCCAATGTGTGCCTTCACTCAAGG 1954 

AATTCAATTCATTGAGi^^AAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

M I I I I M M I I I I I I I I I I I I I I I I I I M I I M I M I I I I I I I I I I I I I I I I I I I I I I I 

AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 2014 



Qy 


1981 


Db 


2015 


Qy 


2041 


Db 


2075 


Qy 


2101 


Db 


2135 


Qy 


2161 


Db 


2195 


Ov 


2221 


Db 


2255 


Qy 


2281 


Db 


2315 



GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 

GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2074 

GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
GGATCATCTCATTAGCAGGTAGTGA/yVGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2134 

ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2194 

GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

I I M M I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2254 

TTGAATGC7\ATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I I 

TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2314 

TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2340 

I Ml I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I 
TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 2374 



RESULT 4 
ABK51681 

ID ABK51681 standard; DNA; 1920 BP. 
XX 

AC ABK51681; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding human ABCG5 protein. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW chromosome 2p21; ds . 
XX 

OS Homo sapiens. 



XX 

FH Key Location/Qualifiers 

FT CDS 1. .1920 

FT /*tag= a 

FT /product^ "Human ABCG5 protein" 

FT /transl_except= (pos : 4. .9, aa: GDLSSLTPGGSMGL) 

FT /note= "This sequence contains 13 exons" 

XX 



PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 



PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; 7yVU98984. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 38; Page 36-37; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have 7VBCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the human ABCG5 gene located on chromosome 2p21. 

CC This sequence encodes the human ABCG5 protein of the invention 

XX 

SQ Sequence 1920 BP; 440 A; 503 C; 486 G; 491 T; 0 U; 0 Other; 

Query Match 82.1%; Score 1920; DB 6; Length 1920; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1920; Conservative 0; Mismatches 0; Indels 0; Gaps 0 



143 ATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCC 202 

I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 

1 ATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCC 60 

203 CCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGG 2 62 

I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I 

61 CCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGG 120 

263 CCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTC 322 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I 
121 CCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTC 180 



Qy 



323 TCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAA 382 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I M I I M M M I I I N I I I I I I I I I 



Db 



181 TCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAA 240 



Qy 


T 


arr-ArrrTrr^rr Arrrr ATr:;Trr(^nr;A(^f;rTf;c;GGCGrGr(^GGGACCTTCCTGGGGGAG 


442 


Db 


241 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

ACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAG 


300 


Qy 






502 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 




Db 


301 


GTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTC 


360 


Qy 


503 


CTGCAGAGCGACACCC i oC i tjAtaL-Av^L-L. i L-AUL-VjI IjL-'jk^ta-M.LjAL^^aL- 1 (aU/i^^ i/\U/\UL-LTL.o 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


361 


CTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCG 


420 


Qy 


563 


CTGCTGGCCATCCGLOCjLGGL.AAi LL.L.tj(jv^i Uv^i i L7^a/\Vjijk^^>_.oi yj 


622 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


421 


CTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATG 


480 


Qy 


623 


GCAGAGCTGAGTCTGAGCCATGTGGCAGACCGAUi. GAi i GCjUAAL i AL.AbL,i i (a(ata(cr(jijL. 


D O 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 i 1 1 1 1 




Db 


481 


GCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGC 


540 


Qy 


683 


AT TTCCACGGGI GAGCGGLCjULGbbi v.^i L.UAi L.tjL./\vji-L-^^/\(aUi LjL-x 


742 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


541 


ATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAG 


600 


Qy 


743 


GT CAT GC T GT T T GAT GAG C CAAC C AC AG GL. L- i G OAC 1 (j U A i UAC Hj i AA i L, a^jA i ivj i*^ 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


601 


GTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTC 


660 


Qy 


803 


i"m/^/^my^/-^m/^/^rn/"r*7\ TV /^iTi("/^/^rn/^/^/^ 7\ /^/"7\ 7\rT'r^7\ 7\'T''T'/"'T'^^/^'T''T*/^'T'r^Z\r^f^A'T'T'r^Ar^f~'A f^r^f^f"* 

GTCCTCCTGGTGGAACI GGL-i CGLAGGAACuCjAAI HjUjlji ICiL-AUUAi i 


862 




1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


661 


GTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCC 


720 


Qy 


863 


CGTTCTGAGCTTTTTCAGCTCTTTGACAAAAi i CjL-L-A1 COHjACjU 1 iCQjtjAbrAijCHjAi i 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 




Db 


721 


CGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATT 


780 


Qy 


923 


TTCTGTGGCACGCCAGCGGAAAi GL-i 1 CjAJ. i 1 L.i i CAAl t^AUi uoijiji i al.(>-l.i Hai t^v^ i 


982 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


781 


TTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCT 


840 


Qy 


rt o o 


(jAACAI 1 UAAACUC 1111 oAC 1 1 V^1>V1>V1 L3Vj/\k^k^ i VjrVV^Vji \^±\kj x. vj^jrtirt'^\^\^.rtJ-\/Ao\_/i''UT.vj 


1042 




M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


841 


GAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAG 


900 


Qy 


1043 


r'AA/^r^r^r'AAA'T'Ar'AAA nr*^r^r*Ts. apar' APTPP A(^ ATCAT Af^A ATPTr^Pf'T Af A AHAA ATC A 
CjAAL-CjCjCjAAAI AoAAACL. 1 CCAAoAIjALj 1 ^^0/\o/\l ij/\l^o/^/\i ± i J-v\^.rtrvo.rUArtJ. 


1102 




1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 




Db 


901 


GAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCA 


960 


Qy 


1103 


GCAATTT GTCAT AAAACTTT GAAGAAT ATT GAAAGAAT GAAACACCT GAAAACGTT ACCA 


1162 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 




Db 


961 


GCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCA 


1020 


Qy 


1163 


ATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTG 


1222 




1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1021 


ATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTG 


1080 



Qy 


1223 


Db 


1081 


Qy 


1283 


Db 


1141 


Qy 


1343 


Db 


1201 


Qy 


1403 


Db 


1261 


Qy 


1463 


Db 


1321 


Qy 


1523 


Db 


1381 


Qy 


1583 


Db 


1441 


Qy 


1643 


Db 


1501 


Qy 


1703 


Db 


1561 


Qy 


1763 


Db 


1621 


Qy 


1823 


Db 


1681 


Qy 


1883 


Db 


1741 


Qy 


1943 


Db 


1801 


Qy 


2003 


Db 


1861 



AGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAG 1282 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I I I I I M I M I I I I I 
AGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAG 114 0 

AATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTA 1342 

I I II II I I I I I I I I I I II I I I I I II II I II I I II II I II I M I I I I I II II I II II I II I 
AATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTA 1200 

AAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTAC 1402 

I I I I II I I I I I I I I I I I I I I I I I I II I II I I I I I I I M I I M I I I II I I I I I I II I I M I 

AAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTAC 1260 

ACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAG 1462 

I I M I I I I II I I I I II I I I I I I I I I I I I M II I I I I I I I I II I I II I M I I I I I I I I I I I 
ACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAG 1320 

AGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTC 1522 

M I I I I I I I M I M I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTC 1380 

CCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTA 1582 

II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTA 144 0 

CATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATT 1642 

M I I I I I II I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATT 1500 

GGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAA7UVTCCAAATATAGTCAACAGT .1702 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I M I I I I I I I I I 
GGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGT 1560 

GTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATA 1762 
M I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
GTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATA 1620 

CAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGT 1822 

I I II II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CAAGT^TGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGT 168 0 

GAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTT 18 82 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
GAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTT 174 0 

TCTGTGACAACTAATCCAATGTGTGCCTTCACTC7UVGGAATTCAATTCATTGAGAAAACC 1942 
I I I I II I I I II I II I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
TCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCT^TTCATTGAGAAAACC 1800 

TGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCA 2002 
I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II M I I I I I I I I I II I I I I I I I I II 
TGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCA 18 60 

GCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAG 2062 

I I M I M M I I I I I M I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I M I I I I I I I I M 

GCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAG 1920 



RESULT 5 
ABK51685 

ID ABK51685 standard; cDNA; 2354 BP. 
XX 

AC ABK51685; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Mouse ABCG5 cDNA sequence. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hyper sterolertiia; Alzheimer's disease; 

KW ss . 
XX 

OS Mus sp. 
XX 

FN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45; 66pp; English, 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 



CC excretion and/or decreasing cholesterol adsorption. The present nucleic 
CC acid sequence represents the cDNA sequence of the mouse ABCG5 gene of the 
CC invention 
XX 

SQ Sequence 2354 BP; 573 A; 604 C; 594 G; 583 T; 0 U; 0 Others- 
Query Match 60.2%; Score 1409.2; DB 6; Length 2354; 
Best Local Similarity 80.4%; Pred. No. 0; 

Matches 1664; Conservative 0; Mismatches 403; Indels 3; Gaps 1; 

Qy 25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAG7\AAATTTGC 84 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 57 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAG7W\ATTCAC 116 

Qy 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III 
Db 117 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 176 

Qy 145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

III II I I I I M M I II I I II I I I M I I I M I I I I I I I I I I 
Db 177 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 236 

Qy 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAG 261 

I II II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I 
Db 237 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGG 296 

Qy 262 GCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGT 321 

III MINI I I I I I I III I I I I I I II I I I II MINI I I I I I II I I M I I I 

Db 297 GCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGT 356 

Qy 322 CTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAA 381 

I I I I I I I I I I I I I I I I II I I I I I II I I I I I M I I I I I I I I I I I II II Mill 
Db 357 CTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAA 416 

Qy 382 AACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGA 441 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I Mill I Mini I II II I 

Db 417 GACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGA 476 

Qy 442 GGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGT 501 

I I I II I I I I I I III I I I II I II I I I II I I M II I I M II I I I II I II II II I 
Db 477 GGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGT 536 

Qy 502 CCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGC 561 

I M I I I II I I I M I II I I I I I II I II I I II M I I I I I I I I III II I II M 

Db 537 CCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC 596 

Qy 562 GCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCAT 621 

I I I M I I I I II I II II II I I II II I I I I M II I I I II I I II II I 

Db 597 GATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTACi\ACAAGAAGGTAGAGGCAGTCAT 656 

Qy 622 GGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGG 681 

I I I I I II I II I I I II M II II I II II II I M I II II I I III I II Mill 
Db 657 GACAGAGCTGAGCCTGAGCCACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGG 716 

Qy 682 CATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAA 741 

II I I II I M I I II I I I I I II II I I M I M I I I I I I I II II I I I I I II M 

Db 717 AATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAA 776 



Qy 742 GGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGT 8 01 

I I I I I I I II I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 777 GGTCATGATGCTAGATGAGCCAACCACAGGACTGGACTGCATGACTGCAAATCAAATTGT 836 

Qy 802 CGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCC 861 

I I III III II I II I I II I I I I I I I II I I I I M I II I II M I I II I I I I M 

Db 8 37 CCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCC 896 

Qy 862 CCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGAT 921 

II I I I II I I I II II I III I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 8 97 TCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGT 956 

Qy 922 TTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTC7\ATGACTGCGGTTACCCTTGTCC 981 

M I I I I I I I I I I I I I III I I I I I I I I I I I I I I II I I I I I I I I I I M I I I II 
Db 957 GTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCC 1016 

Qy 982 TGAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAA 1041 

I I I I I I I I I II II Mill II II I I I I I I I I II I I I I I II I I I I I I M I I I 

Db 1017 TGAACATTCCAATCCCTTTGATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAG 1076 

Qy 1042 GGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATC 1101 

II I I I I I I I I I I I I I I I I I I I MM MUM I II II II I I I I II I I I II 
Db 1077 AGAGCGGGAAATAGAAACGTACAAGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATC 1136 

Qy 1102 AGCAATTTGTCATAAAACTTTGAAGAATATTG7WVGAATGAAACACCTGAAAACGTTACC 1161 

I M I I II I II I I I I II I I I II II I I I I I I II M I II II II I I I 

Db 1137 TGACATCTATCACAAAATTCTGGAGAACATTGAAAGAGCACGATACCTGAAAACCTTACC 1196 

Qy 1162 AATGGTTCCTTTCAAAACC7VAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCT 1221 

I II I I II M I I I II II I II II I I I I I I II I III II II I I I II M M 
Db 1197 CACGGTTCCTTTCAAAACAAAAGATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCT 1256 

Qy 1222 GAGGAGAGTGACAAGAAACTTGGTGAGA7\ATAAGCTGGCAGTGATTACGCGTCTCCTTCA 12 81 

II II II I I II I I I I I II II I II I II I II I I I I II I II II II II I I I I I I I I I 

Db 1257 GAGGCGAGTAACAAGAAACTTAATGAGGAATT^GCAGGCAGTGATTATGCGTCTCGTTCA 1316 

Qy 12 82 GAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCT 1341 

M II I I I II I II I I I I I II I II I I I II I I II I I M I I I I I I II III 
Db 1317 GAATCTGATCATGGGCCTCTTCCTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCT 1376 

Qy 1342 AAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTA 1401 

I I I II I III I I I I II I I I I I I II II II II III II II I I I I I I I II II II 

Db 1377 AAAGGGCGCTGTGCAGGACCGCGTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATA 1436 

Qy 1402 CACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGA 1461 

III I II I I I I I II I I I I I I I I II I II I I II I Mill MM I I I II I I I I M II I 

Db 1437 CACCGGCATGCTCAATGCTGTGAATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGA 1496 

Qy 1462 GAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCT 1521 

M M I I II I II I II II M I II II M I I I I I MM I II I I I II I I II I II I 

Db 14 97 GAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCT 1556 

Qy 1522 CCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTT 1581 

II I I II M I I I I I I II II I II II II II I II I M I I I II II I I I I I II II II 

Db 1557 CCCCTTCAGCGTCATCGCCACGGTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTT 1616 



Qy 

Db 



1582 
1617 



1641 
1676 



Qy 1642 TGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAG 1701 

III I I I I M I I I M I I I I I I II I II I II II I I M II I I II I I II I I I I M I I I 
Db 1677 TGGAGAATTTCTAACACTTGTGCTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAG 1736 

Qy 17 02 TGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACAT 1761 

I I I I I I I I I I I I I I III I III I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 1737 TATAGTGGCTCTGCTCAGCATCTCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACAT 17 96 

Qy 17 62 ACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAG 1821 

I I I I I I I I I I II I I I I I II I I MINI I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 1797 ACAAGAAATGCCCATTCCTTTAAAAATCCTGGGTTATTTTACATTCCAAAAATACTGTTG 1856 

Qy 1822 TGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGT 18 81 

I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I II II 
Db 1857 TGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACAC 1916 

Qy 18 82 TTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAAC 1941 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I M 
Db 1917 CTCTATGCTAAATCACCCGATGTGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAAC 1976 

Qy 1942 CTGCCCAGGTGCTVACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCC 2001 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I II II II III I I I I I II 

Db 1977 CTGCCCAGGTGCTACATCCAGATTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCC 2036 

Qy 2002 AGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTA 2061 

I I I II I I I I I I I I I I I I I M I I I I II I III I I I I I I I M I M I I I M II 
Db 2037 AGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATA 2 096 

Qy 2062 GTGAAAGCCATGGCTGGGAA7\ATGGAAGTG 2091 

I I I I I I I I I M I III 

Db 2097 GTT7VAGATGACAGGCAGGAAAGGGTTAATG 212 6 



RESULT 6 
AAD22008 

ID AAD22008 standard; DNA; 2258 BP. 
XX 

AC AAD22008; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) . 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hyper cholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 17; ds . 

XX 

OS Mus sp. 
XX 

FH Key Location/Qualif iers 

FT CDS 47. .2005 

FT /*tag= a 



FT /product= "Mouse SSG protein" 
XX 

PN WO200179272-7^. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-01984 65P . 

PR 15-MAY-2000; 2000US-0204234P , 
XX 



PA (TULA-) TUL7VRIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR P-PSDB; 7yVE13289. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS claim 8; Fig 7; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG DNA. Mouse SSG is located on chromosome 17 

XX 

SQ Sequence 2258 BP; 549 A; 579 C; 567 G; 563 T; 0 U; 0 Other; 



Query Match 59.6%; Score 1395.6; DB 6; Length 2258; 

Best Local Similarity 80.7%; Pred. No. 0; 

Matches 1642; Conservative 0; Mismatches 389; Indels 3; Gaps 1; 

Qy 61 GGGTCCGGCCACCAGATU^UVTTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 GGGACAGGCCACTAGA7\AATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCC 60 

Qy 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I III II III III II I II I I M I I I I I II I I I I I II 
Db 61 CTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATC7\ACAGAGGGTCTCTGAGCTCCCT 120 

Qy 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGCATCCTCCATGCCTC 237 

II I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I I II 



Db 121 GGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTC 180 

Qy 238 CTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTG 2 97 

I I I M I I I I I I I I I I I I II Mil I II M I I I I I II III I I I I I I I I MM 

Db 181 CTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTG 240 

Qy 298 GACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCAT 357 

I II II II II II I I II I II I II I I M II II II I I II II II I II II I II I I II I 
Db 241 GGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCAT 300 

Qy 358 CCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGG 417 

I II II I II II II I I II II II I II I I I I II II II I I II II II M II II I II I II I 

Db 301 CTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCG 360 

Qy 418 GCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCA 477 

II II I I I I II I I II II I I II I I I II I II III I I I I I II II II I I II 

Db 361 GCGCACTGGGACCCTGG7\AGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCA 420 

Qy 478 GTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGT 537 

I II I II I I II II I I I II I I I II I I I I I I I I II I II I I I II I I II II II I I I II 

Db 421 GTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGT 4 80 

Qy 538 GCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTT 597 

II M I II I I I II I II I I I II I II I I II I I I I I I M I I I I I I I I I 

Db 481 GCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTA 540 

Qy 598 CCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACT 657 

I I II I II I II Mill I I II I I II I I II II II II II I I II Mill I I II I I 
Db 541 CAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCACGTGGCGGACCAAAT 600 

Qy 658 GATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGC 717 

I II I II II III I II I II I I II I I I II II I II II I II I M II I I II I I M 

Db 601 GATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGC 660 

Qy 718 AGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGA 777 

II I I I I II II I I I II II I I II I I I I I II I II II II I II I I II II II I II II 

Db 661 AGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCAACCACAGGACTGGA 720 

Qy 778 CTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGG7\ACCGAAT 837 

II II II II I I II I I I II II II II I III III II M II I II II M I I I II I II 
Db 721 CTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAAT 780 

Qy 838 TGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGC 897 

II I I II I I I I II I II I II II I II I I I II M I II II I III I II I I I II II I 
Db 781 TGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGC 840 

Qy 898 CATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTT 957 

I I II II II I II I I I I I II I I II II II II I I II I I III I II M I I Mill 

Db 841 CATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTT 900 

Qy 958 CAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCT 1017 

II II MM II I I I I II II I II II II I II II II II II I II II II I I II I I I 

Db 901 CAATAACTGTGGTTACCCCTGTCCTG7VACATTCCAATCCCTTTGATTTTTACATGGACTT 960 

Qy 1018 GACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGAT 1077 

III II I II I II I I M M I II I II II II II I II II I II I II II II I I II II I 

Db 961 GACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTACAAGCGAGTACAGAT 102 0 



Qy 107 8 GATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAG 1137 

I I I I fl I I I I I I I I I I I I I I II I III I I II I II MM M M I II I 
Db 1021 GCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTGGAGAACATTGAAAG 1080 

Qy 1138 AATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGT 1197 

I I I I M II M II II I I I I M I I II II II II I I II II II I I II II II I I 

Db 1081 AGCACGATACCTGAAAACCTTACCCATGGTTCCTTTCAAAACAAAAGATCCTCCTGGGAT 1140 

Qy 1198 TTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCT 1257 

III II II I II II II I I II I I II I I II I M I I II II I II I M II I I I 
Db 1141 GTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAAC7\AGAAACTTAATGAGGAAT7^GCA 1200 

Qy 1258 GGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGT 1317 

II I I M I I II I II II I I I I I I II M I I M II I I M M I II II I I II II I I I I 

Db 1201 GGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTCCTCATTTTCTACCT 1260 

Qy 1318 TCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTA 1377 

I I I I I I I I I MM M I II I I II M I I I I I II I M I II I I I I II I I 

Db 12 61 TCTCCGCGTCCAGAACAACACGCTAAAGGGCGCTGTGCAGGACCGCGTGGGGCTGCTCTA 132 0 

Qy 137 8 CCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGT 1437 

III I I I I II I II II M M Mill I I II M II M I II II II II M I II M I I I 

Db 1321 TCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTGAATCTGTTTCCCAT 138 0 

Qy 1438 GCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGAT 1497 

II I I II II I I II II II II I I II M I M I I II II I II II II I II II I II II I I I 

Db 1381 GCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCT 1440 

Qy 1498 GCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAG 1557 

III Mill I II I II II M II I II II II I I I I I Mill I I II I M M I II 

Db 1441 GCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACGGTCATTTTCAGCAG 1500 

Qy 1558 TGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGC 1617 

II I I II II Mill I I II II II M I II I I II I II I II II II II I I I II II II I 
Db 1501 TGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTTGGATATTTCTCTGC 1560 

Qy 1618 TGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGT 1677 

I I I II I II II I II I II I M II II I I I II II I I I II I II M I II I I II I M II II 
Db 1561 TGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTGCTGCTTGGTATAGT 1620 

Qy 1678 CCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGT 1737 

I II I II M I I I II I I I I II II I II I I I I I I II I II III I III II II I I 

Db 1621 CCA7WVCCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATCTCTGGGCTGCTTAT 168 0 

Qy 1738 TGGATCTGGATTCCTCAGA7\ACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTA 1797 

II II II I II I M I I II II I M II I II I II I II II I II I M I II II II II I I I II 

Db 1681 TGGATCTGGATTTATCAGAAACATACAAGAAATGCCCATTCCTTTAAAAATCCTGGGTTA 174 0 

Qy 17 98 TTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAA 1857 

I I I II M II II I II I M I II II M II II II II I I M I II II M Mill II I I I 

Db 1741 TTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAA 18 00 

Qy 1858 TTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCA 1917 

M I II I I I II I I I I I I I II I I I I I I I I II I II M I I I II II 

Db 18 01 CTTCACTTGTGGTGGATCCAACACCTCTATGCTAAATCACCCGATGTGCGCCATCACCCA 1860 



Qy 1918 AGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTT 1977 

III I II I I I I I I I I I I I I M I I I I I I I I I II II I II I I M I I I I II I I I 

Db 1861 AGGGGTCCAGTTCATCGAGTW^CCTGCCCAGGTGCTACATCCAGATTCACGGCAAACTT 1920 

Qy 1978 TCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAAT 2 037 

I I I I I I I M II I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I II I 
Db 1921 CCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGT 1980 

Qy 2038 AAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTG 2 091 

I I II I I I I I I I I M I I I I II I I I I I I I I I I III 

Db 1981 CAGGGACTACCTGATTAGCAGATAGTTAAGATGACAGGCAGGAAAGGGTTAATG 2 034 



RESULT 7 
AAD48880 

ID AAD48880 standard; DNA; 1959 BP. 
XX 

AC AAD48880; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG5 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 
KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia ; gall stone; 
KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 
KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 
KW ABCG5; gene; ds . 
XX 

OS Mus sp. 

XX 

FH Key Location/Qualifiers 
FT CDS 1. .1591 

FT /*tag= a 

FT /product= "mABCGS protein" 

XX 

FN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 
PR 28-NOV--2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH^ Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 
DR P-PSDB; AAE31702. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 



XX 

PS claim 11; Page 73; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG5 DNA 
XX 

SQ Sequence 1959 BP; 468 A; 506 C; 495 G; 490 T; 0 U; 0 Other; 



Query Match 58.4%; Score 1365.4; DB 7; Length 1959; 

Best Local Similarity 81.4%; Pred. No. 0; 

Matches 1595; Conservative 0; Mismatches 361; Indels 3; Gaps 1; 

Qy 107 ATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGC 166 

I I I I I I I I I I I I I I I I III II III Ml II I II I II I I I 

Db 1 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 60 

Qy 167 TCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGC 223 

I I I II I I I I I II I I I I I I I I I I I I I I I II I I II I II I I II 
Db 61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 120 

Qy 224 ATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCT 283 

I I I I I M I I I I I I I I M I I I I I I I II I I I I I I I I I I I I I I I I I I I I M 

Db 121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCA 180 

Qy 284 TGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGG 343 

MM I II I I I I II I I I II I I I I II I I II II I II I I I I II I I I I I II II I II 
Db 181 TGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGC 240 

Qy 344 CAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATG 4 03 

I I I I I M II I II II II I I I I I I I I II I I I I II I I I I I I I I I I I I I M I II I I M 

Db 241 CAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATC 300 

Qy 404 TCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTG7VACGGCCGGGCG 4 63 

II I II I II I I I I II I II I I II II I I I I M II II I I I II I I I III I I I 

Db 301 TCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAG 360 

Qy 464 CTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTG 523 

I I I I I I II I I I II I II I I I II I II I I I II I I II I II M I I I I II I II I I III 

Db 361 CTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTG 420 

Qy 524 AGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGC 5 83 

II M I I II I M I I I I I II I II II III II I I I III I I II I I II II I II I I II 

Db 421 AGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGC 4 80 

Qy 584 AATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCAT 643 

I I II II I I I I I II I I I I I II I I I II I I I I I I I I M I I M I M II I 

Db 481 TCCGCGGACTTCTACTyVCAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCAC 540 

Qy 644 GTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGC 7 03 

II I M I II I I I II I M I II I I I I M I II I I I M I I I I I I M II I I I II 

Db 541 GTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGC 600 



Qy 704 CGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCA 7 63 

II II I I I II I I I I I II II II II I I I II II I I I I I I I I I II I I I I I I I I I I 
Db 601 CGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTGATGATGCTAGATGAGCCA 660 

Qy 764 ACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCT 823 

I M I I I I I I I I I I I I I I I I I I I I I I Mill II I I I I I III III II MINI 

Db 661 ACCACAGGACTGGACTGCATGACTGCTW^lTCTWVTTGTCCTTCTCTTGGCTGAGCTGGCT 720 

Qy 824 CGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTC 883 

MINI I I I I I I I I I I I II I I I I II I II I I I I I I M II I I I M I II II I I 

Db 721 CGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACAC 7 80 

Qy 884 TTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAA 943 

II I I I M I M I I I I I I I I I I I I M I I I I I M I I I I I I I I I I M I I I I III 

Db 781 TTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAG 840 

Qy 944 ATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGAC 1003 

I I I I M I I I I I I I I I I I I I I II II I I I I I I I I I I I M I I I I I II II I I I I I 

Db 841 ATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGAT 900 

Qy 1004 TTCTATATGGACCTGACGTGAGTGGATACCCAAAGCAAGGAACGGGTWVTAGAAACCTCC 1063 

II II I I I I I I I I I I I I I I I I I I I I I M I I M I II I I I I I I M I I I I I I I I 

Db 901 TTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTAC 960 

Qy 1064 AAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTG 1123 

III MM I I II I I I I II I I I I I I II II II I I I II I III MM I II 

Db 961 AAGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTG 102 0 

Qy 1124 AAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAA 1183 

I I II II I I I II II I II I M I M I I I II I I I I I I II I I I II M I II I III 

Db 1021 GAGAACATTGAAAGAGCACGATACCTGAAAACCTTACCCATGGTTCCTTTCAAAACAAAA 1080 

Qy 1184 GATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTG 1243 

I M I I II II I I II I I I I I I I II I I I I I I I II II II M I I I II II I I 

Db 1081 GATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTA 1140 

Qy 1244 GTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTC 1303 

II I I I II I I I I I II I I I II II I II I M I I I II II II I II I I II I II II I III 

Db 1141 ATGAGGAATTUVGCAGGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTC 1200 

Qy 1304 CTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGC 1363 

I II I I II M I I I M I I II II I II I II II I II I I I M I II M I II I I 

Db 12 01 CTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCTAAAGGGCGCTGTGCAGGACCGC 1260 

Qy 1364 GTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTG 1423 

II II II II II III II II II I II I I I M I I I I I I I M I II II M I I II I I 

Db 12 61 GTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTG 132 0 

Qy 1424 AATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAG 1483 

I I I I II II I I I I I I I I I II II I I II II I I M II I M I II I I I I I II I II II II 

Db 1321 AATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCAT 138 0 

Qy 1484 AAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACC 1543 

I II I II II 1 1 1 1 I II I I II 1 1 I II 1 1 1 1 II 1 1 1 1 1 1 II I II 1 1 1 I II II I 

Db 1381 AAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACG 144 0 



Qy 1544 ATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTT 1603 

I I M I I I I I I I I I I I I I I II II I I I I I I I I M I I I M I II I I I II I I II II 
Db 1441 GTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTT 1500 

Qy 1604 GGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTG 1663 

I I II II I I I I I I II II I II I I I I M I I I II II I II I I I M I I II I I I I I I I I I I I 

Db 1501 GGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTG 1560 

Qy 1664 CTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATT 1723 

II I I I I I I I I I I I I I I I I II I I I I I I I I I I M II I I I I I I I I I I I I I III 

Db 1561 CTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATC 1620 

Qy 1724 GCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTT 1783 

I III I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I 
Db 1621 TCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACATACAAGAAATGCCCATTCCTTTA 1680 

Qy 1784 AAAATCATCAGTTATTTTACATTCCAA7\7\ATATTGCAGTGAGATTCTTGTAGTCAATGAG 1843 

I II I II I I I I I I I I I I I II I I I I I I I I I I II I I M I I I I I I II I I I I I I I I I 

Db 1681 AAAATCCTGGGTTATTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCAATGAG 1740 

Qy 184 4 TTCTACGGACTGT^TTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATG 1903 

II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

Db 1741 TTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACACCTCTATGCTAAATCACCCGATG 1800 

Qy 1904 TGTGCCTTCACTC7\AGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGA 1963 

II III I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I III 
Db 18 01 TGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAACCTGCCCAGGTGCTACATCCAGA 1860 

Qy 1964 TTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATA 2023 

II II I I I I I I II II II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1861 TTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAGGAATA 1920 

Qy 2024 GTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAG 2062 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1921 GTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATAG 1959 



RESULT 8 
ABK51686 

ID ABK51686 standard; cDNA; 2035 BP. 
XX 

AC ABK51686; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE cDNA encoding rat ABCG5 protein. 
XX 

KW Rat; ABCG5 ; ATP-binding cassette gene 5; sitosterolemia ; cholesterol; ss; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease. 
XX 

OS Rattus sp. 
XX 

FH Key Location/Qualifiers 

FT CDS 8. .1965 

FT /*tag= a 

FT /product= "Rat ABCG5 protein" 



PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96986. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45-46; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in MCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypers terolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the rat ABCG5 protein of the invention. (Updated on 

CC 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 2035 BP; 481 A; 533 C; 537 G; 484 T; 0 U; 0 Other; 

Query Match 58.2%; Score 1363; DB 6; Length 2035; 

Best Local Similarity 80.6%; Pred. No. 0; 

Matches 1607; Conservative 0; Mismatches 385; Indels 3; Gaps 1 

Qy 100 GTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAA 159 

I I M I I I I I I I I I I I I I I I I I I I I I II III I I I II II 
Db 1 GCTGGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACAACAA 60 



160 CAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAG 216 



61 CAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTCAGAGGCTCGGCACAG 12 0 

217 CCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACAT 276 

I I II II II III II II I I I M I II I I II II II I I I I I I I I I I I I I I I 

121 CTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGCAACCGTGTCGGGCCCTGGTGGAACAT 180 

277 CACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGA 336 

M III I I I I I I I I Mill I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II 
181 CAAATCATGCCAGCAGAAGTGGGACAGGAAAATCCTCAAAGATGTCTCCTTGTACATCGA 240 

337 GAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGA 396 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
241 GAGTGGCCAGACCATGTGCATCTTAGGTAGCTCAGGCTCAGGGAAAACCACGCTGCTGGA 300 

397 CGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGG 456 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

301 CGCCATCTCTGGGAGGCTGCGGCGCACAGGGACCTTGGAAGGGGAAGTGTTTGTGAACGG 360 

457 CCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACAC 516 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
361 CTGCGAGCTGCGCAGGGACCAGTTCCAAGACTGCGTCTCCTACCTCCTGCAGAGCGATGT 420 

517 CCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCG 57 6 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I Mill III I I II I I I MM 
421 CTTTCTGAGCAGCCTCACGGTGCGGGAGACGCTGAGATACACGGCGATGCTGGCTCTCCG 48 0 

577 CCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCT 636 

I II II I I II II I I II II II II Mill III II II II I II II I II I 

481 CAGCAGCTCCGCGGACTTCTACGACAAGAAGGTAGAGGCAGTCCTGACAGAGCTGAGTCT 54 0 

637 GAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGA 696 

II I II I II II II II II I I I M II II M II I I I II II I II II II I I I II 

541 GAGCCACGTGGCAGACCAAATGATCGGCAACTATAATTTTGGGGGGATTTCCAGTGGCGA 600 

697 GCGGCGCCGGGTCTCCATCGCAGCCGAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGA 756 

I II II II I I II II II II II I II I I I II II II II I II I II II II II II I II I 

601 GCGGCGCCGAGTGTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTTGA 660 

757 TGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGA 816 

II M II M I I M II II I II II II M II II II II II I II Ml I I II I MM II 

661 CGAGCCAACCACAGGACTGGACTGCATGACTGCAAATCATATCGTCCTCCTCTTGGTCGA 720 

817 ACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTT 87 6 

II II II I II II II I II I II I II II II II II I I I II II I I II II I II II I II 
721 GCTGGCTCGCAGGAACCGCATTGTAATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTT 780 

877 TCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCC 936 

M I III I II II II II II M I I I II I II I II II II I II II II I II I II II 
781 CCACCACTTCGACAAAATTGCCATTCTGACTTACGGAGAGTTGGTGTTCTGTGGCACGCC 840 

937 AGCGGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCC 996 

II III I II I I I I II II I I I I I I II II II II I I II II I II II II I I I II II 

841 AGAGGAGATGCTCGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCC 900 

997 TTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGA 1056 
Mill Mill I II II I II II II I II I I II II I II II I II II II Mill 



Db 901 CTTTGATTTCTACATGGACTTGACATCGGTGGACACCCAAAGCAGAGAGCGAGAGATAGA 960 

Qy 1057 AACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAA 1116 

II I I I I I I I I I I I I I I II I I II I I II I I I II I I I I I I II I I I I I II 
Db 961 GACGTACAAGCGAGTCCAGATGCTGGAATCTGCCTTCAGGCAATCGGACATCTGTCACAA 1020 

Qy 1117 AACTTTGAAGAATATTGAAAGAATGAAACACCTGTUW^CGTTACCAATGGTTCCTTTCAA 1176 

II II MM II II I II I II I I II II I I I M I I II I I I II II I I II I I M I 

Db 1021 AATCCTGGAGAACATTGAAAGAACAAGACACCTGAAAACCCTACCCATGGTTCCTTTCAA 108 0 

Qy 1177 AACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAG 1236 

III III II I I II III I II I I II II M M I II I I I II II I I II I II II 

Db 1081 AACGAAAAATCCTCCCGGAATGTTCTGCAAGCTCGGCGTTCTCCTGAGGAGAGTAACGAG 1140 

Qy 1237 AAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGG 1296 

II I I I I II I II M I I I II I II II II I I M M II I I II M I I I M II II M 

Db 1141 AAACCTAATGAGGAATAAGCAGGTGGTGATTATGCGTCTTGTTCAGAATCTGATCATGGG 1200 

Qy 1297 TTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCA 1356 

I II I I I I M II I I II I II II I I I II I MM I I I I I II I I I I I I I I 

Db 1201 TCTGTTCCTCATTTTCTACCTTCTCCGAGTCCAGAACAACATGCTGAAGGGCGCTGTTCA 1260 

Qy 1357 GGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAA 1416 

I I M M II M II II I I II I II I I II II I I I I I I I I I II II I I I I I II II I II 

Db 1261 GGACCGCGTAGGGCTGTTGTACCAGCTTGTGGGTGCCACCCCGTACACCGGCATGCTCAA 132 0 

Qy 1417 CGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCT 1476 

I I II I II I I II I II I I I II II I I I I I I II II I II I I I II I I M II I II I I I I II I 
Db 1321 CGCTGTGAACCTCTTTCCCATGCTGAGAGCTGTCAGCGACCAGGAGAGTCAGGATGGCCT 1380 

Qy 1477 CTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGT 1536 

M I I I I I I II I I I I I M I I I I I I I I II II Mill I I II I II I M II I I II 
Db 1381 GTACCAGAAGTGGCAGATGCTGCTCGCCTATGTGCTGCATGCTCTCCCCTTCAGCATCGT 1440 

Qy 1537 TGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGC 1596 

I I II I I I II II I II I I I I I I I I I I I I I I II I I I I I I I M I MM Mill M 
Db 1441 TGCCACGGTGATTTTCAGCAGCGTGTGTTACTGGACTCTGGGCTTGTATCCCGAGGTCGC 1500 

Qy 1597 CCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAAC 1656 

I II I II I I II I II Mill I I I II II II I M I II II I I II M I I II M I M II 
Db 1501 CAGATTTGGATACTTCTCTGCCGCTCTGTTGGCCCCTCACTTAATTGGAGAATTTCTGAC 1560 

Qy 1657 TCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCT 1716 

I II II II I I I M I I II I II I I I II II II II I I II II I M II I II I II I I II I 
Db 1561 ACTTGTGCTGCTTGGTATGGTCCAAAACCCCAATATTGTCAACAGCATAGTGGCTCTGCT 1620 

Qy 1717 GTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCAT 177 6 

I III I III I II I I II II I M I II I I I II II II II I I I II II II II I I M 
Db 1621 GAGTATTTCTGGGTTGCTCATTGGATCTGGATTTATCAGAAACATAGAAGA7VATGCCCAT 168 0 

Qy 1777 TCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGT 1836 

I I II I I II I I I I I II I I I I I I I I I II I II I II II I II II M I II I II II 

Db 1681 TCCTTT7WWVTCCTGGGTTACTTTACCTTCCAAAAGTACTGTTGTGAGATTCTTGTGGT 174 0 

Qy 1837 CAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAA 1896 

II I II II I I I M II I II I I II I I II II I M I II I II I I I I II I III III 

Db 1741 CAATGAGTTCTATGGCCTGAACTTCACTTGTGGTGGCTCCAACACTTCTGTGCCAAATAA 1800 



Qy 1897 TCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAAC 1956 

I I I I I I I M II I II II I II II I I II I II I I I II II I I I I I I I II I I I II II 

Db 1801 CCCAATGTGTTCCATGACCCAAGGGATCCAATTCATTGAGAAAACCTGCCCAGGGGCCAC 1860 

Qy 1957 ATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCT 2016 

II I I I I I II I I I I I I I I II I I I I II I I I I I M I I II I I I I I M I I M 

Db 1861 GTCCAGATTCACGACTWICTTCCTGATCTTGTACTCGTTCATCCCGACTCTTGTCATCCT 1920 

Qy 2017 AGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCT 2 076 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I II I 
Db 1921 GGGGATGGTGGTCTTTAAAGTCCGGGACTACCTGATTAGCAGATAGGTAAGATGGCAGGC 1980 

Qy 2077 GGGAAAATGGAAGTG 2091 

I I I I I I III 
Db 1981 AGGAAAGGGTTAATG 1995 



RESULT 9 
ABK51684 

ID ABK51684 standard; DNA; 1915 BP. 
XX 

AC ABK51684; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding mouse ABCG5 protein. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW ds. 
XX 

OS Mus sp. 



XX 

FH Key Location/Qualifiers 

FT CDS 1. .1915 

FT /*tag- a 

FT /partial 

FT /product= "Mouse ABCG5 protein" 

FT /transl_except- (pos: 1912. .1915, aa: LGIVIFKVRDYLISR) 

FT /note= "This sequence lacks a stop codon" 

XX 



FN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2 OOOUS-02352 68P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 



DR P-PSDB; AAU96985. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 42-43; 66pp; English, 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases 7VBCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the mouse ABCG5 protein of the invention 

XX 

SQ Sequence 1915 BP; 453 A; 502 C; 484 G; 476 T; 0 U; 0 Other; 



Query Match 57.1%; Score 1335.8; DB 6; Length 1915; 

Best Local Similarity 81.5%; Pred. No. 0; 

Matches 1560; Conservative 0; Mismatches 352; Indels 3; Gaps 1; 

Qy 107 ATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGC 166 

I I I I M I I I I MINI Ml II III III II I I I II I I I I 
Db 1 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 60 

Qy 167 TCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGC 223 

I I I I I M I I I I I M I I I I I I II I I I I I I I II I M I II I II 
Db 61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 120 

Qy 224 ATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCT 283 

MM I I II I I I II I II II II I II I I I I II I I II II I M I I I I I II III 
Db 121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCA 180 

Qy 284 TGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGG 343 

I II I MM II II I MUM I II I I II I II I II II II M II I II I I I II I II 

Db 181 TGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGC 240 

Qy 34 4 CAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATG 403 

II I II II I I M I II II II I II I II II I II II II I II I I II II II I I II II I II I 

Db 241 CAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATC 300 

Qy 4 04 TCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCG 463 

II II II I I I I M II I II I I II I M I I II I II II I I II II II III I I I 



Db 301 TCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAG 360 

Qy 4 64 CTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTG 523 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I III 

Db 361 CTGCGCAGGGACCAGTTCC7\AGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTG 420 

Qy 524 AGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGC 583 

I I II I M I II I I I M II I I I I II III I II I I III I II I I I I I II I I I II II 
Db 421 AGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGC 480 

Qy 584 AATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCAT 643 

I I II II I I II I I II I I MM! I I I I II I II I I I I I I I I I I I I I I I 
Db 4 81 TCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCAC 540 

Qy 644 GTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGC 703 

I I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I II I 

Db 541 GTGGCGGACCA7\ATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGC 600 

Qy 704 CGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCA 7 63 

II II I I I I I I I I I I II II II II I I I I I II I M I I I I I I II I I I I M I I I I 

Db 601 CGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCA 660 

Qy 764 ACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCT 823 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I III III II I I I I II 

Db 661 ACCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCT 72 0 

Qy 824 CGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTC 883 

MINI I I I I I I I I I II II I I I I M I I I I I II I I II I I I I I I II II M I I 
Db 721 CGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACAC 780 

Qy 8 84 TTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAA 943 

II II I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I III 

Db 7 81 TTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAG 840 

Qy 944 ATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGAC 1003 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M II II I I I I I 

Db 841 ATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGAT 900 

Qy 1004 TTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCC 1063 

II II I I I II I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I 

Db 901 TTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTAC 960 

Qy 1064 AAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTG 1123 

III I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I III I I I I I II 

Db 961 AAGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTG 1020 

Qy 1124 AAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAA 1183 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I III 

Db 1021 GAGAACATTGAAAGAGCACGATACCTGAAAACCTTACCCACGGTTCCTTTCAAAACAAAA 1080 

Qy 1184 GATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTG 1243 

III I I I I I I I I III II M I I I I I II I I I I I I I I I I I I I I M I I I I I 
Db 1081 GATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTA 1140 

Qy 1244 GTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTC 1303 

MM I II M I I II I II I II I II I I II I I I II II I II II I I I I II II II I 111 
Db 1141 ATGAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTC 1200 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1304 CTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGC 1363 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1201 CTCATTTTCTACCTTCTCCGCGTCCAGAACAACACGCTAAAGGGCGCTGTGCAGGACCGC 12 60 

1364 GTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTG 14::^3 

II II II II II Ml I I I I I I I I M I I I II II I I I I I I I I I I I II I I I I I I 

1261 GTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCTWVTGCTGTG 1320 

1424 AATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAG 14 83 

I I I I I I I I I I I I I I II I MM I I II I II I I I I I M I II I I I I I I Mill II II 

1321 AATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCAT 138 0 

1484 AAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACC 1543 

II I II I I II I II I I II I II II I II I II I M II I II I I I II II I I I II I II 

1381 AAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACG 144 0 

1544 ATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTT 1603 

I I M I I I I M I II II I I I M Mill I II I II II I II I I II I I II I I II II I 
1441 GTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTT 1500 

1604 GGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTG 1663 

I I I I M I I I II I I I II I I I I I I I I II I I I M II II I I I I I I I I II I II I I II M I 
1501 GGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTT^ATTGGAGAATTTCTAACACTTGTG 1560 

1664 CTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATT 1723 

II I II I I II I M II II II II I II II I I I I II II I II II II I I I I II I III 

1561 CTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATC 1620 

1724 GCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAT^CATACAAGAAATGCCCATTCCTTTT 1783 

I III II I II I I M I I II I M I I I II I I I I I II II I I I I I I I II I M II I II II 
1621 TCTGGGCTGCTTATTGGATCTGGATTTATCAGT^CATACAAGAAATGCCCATTCCTTTA 168 0 

1784 AAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAG 1843 

I I I II I I I I I I I I II II I I II I I II I I I I II I I II I II I I I II I I I I I II II 
1681 AATyVTCCTGGGTTATTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCAATGAG 174 0 

1844 TTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATG 1903 

I I II I II I I I I I I M II I I II I I I I I I I I I I I I I I I I I I I II 
1741 TTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACACCTCTATGCTAAATCACCCGATG 1800 

1904 TGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGA 19 63 

II III I II I I I I II I II II I I I I I II II I II M I I I I I II I I I I I I I III 

18 01 TGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAAAACCTGCCCAGGTGCTACATCCAGA 1860 

1964 TTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAG 2018 

II I I I I I I I I II II II III Mill I II I I II I I II I I II I M 

1861 TTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAG 1915 



RESULT 10 
ADB62671 

ID ADB62671 standard; cDNA; 2512 BP. 
XX 

AC ADB62671; 
XX 

DT 04-DEC-2003 {first entry) 



XX 

DE Human cDNA encoding clone LIVER20030650 . 
XX 

KW Human; ss; gene; pharmaceutical; diagnostic; gene therapy; 

KW tissue regeneration; cell regeneration; membrane protein; 

KW signal transduction-related protein; transcription-related protein; 

KW osteoporosis; neurological disease; cancer; tumour. 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 1469. .2239 

FT /*tag= a 

FT /product^ "Clone LIVER20030650 protein" 
XX 

PN EP1308459-A2. 
XX 

PD 07-MAY-2003. 
XX 

PF 28-MAR-2002; 2 002EP-00007 4 01 . 
XX 

PR 05-NOV-2001; 2001JP-00379298 . 

PR 25-JAN-2002; 2002US-00350978 . 
XX 

PA (HELI-) HELIX RES INST. 

PA (REAS-) RES ASSOC BIOTECHNOLOGY. 

XX 

PI Isogai T, Sugiyama T, Otsuki T, Wakamatsu A^ Sato Ishii S; 

PI Yamamoto J, Isono Y, Hio Y, Otsuka K^ Nagai K, Irie R, Tamechika I; 

PI Seki N, Yoshikawa T, Otsuka M, Nagahari K, Masuho Y; 

XX 

DR WPI; 2003-450961/43. 

DR P-PSDB; ADB64641. 
XX 

PT New polynucleotides and polypeptides, useful for developing a diagnostic 

PT marker or medicines for regulation of their expression and activity, or 

PT as targets of gene therapy. 
XX 

PS Claim 1; Page; 222pp; English, 
XX 

CC The invention discloses a polynucleotide comprising a sequence selected 

CC from 197 0 fully defined nucleotide sequences which encode novel 

CC polypeptides. Also claimed is a polypeptide encoded by the polynucleotide 

CC or its partial peptide, an antibody binding to the polypeptide or peptide 

CC of the polynucleotide, immunologically assaying the polypeptide or 

CC peptide of the polynucleotide by contacting the polypeptide or peptide 

CC with the antibody of the encoded protein, and observing the binding 

CC between the two, a trans formant carrying the polynucleotide in an 

CC expressible manner and an antisense polynucleotide. The oligonucleotide 

CC is useful as a primer for synthesising the polynucleotide, or as a probe 

CC for detecting the polynucleotide. The polynucleotides and encoded 

CC proteins are useful as pharmaceutical agents and many disease-related 

CC genes may be included in them, for developing a diagnostic marker or 

CC medicines for regulation of their expression and activity, or as targets 

CC of gene therapy. The genes are involved in tissue and/or cell 

CC regeneration. Membrane proteins, signal transduction-related proteins, 

CC transcription-related proteins, disease-related proteins and genes 



CC encoding them can be used as indicators for diseases (e.g. osteoporosis, 

CC neurological diseases, cancer, tumours. The cDNA may be used to regulate 

CC the activity or expression of the encoded protein to treat diseases. The 

CC sequence presented is a cDNA of the invention. Note: Some of the sequence 

CC data for this patent is not represented in the printed specification, but 

CC is based on sequence information supplied by the European Patent Office. 
XX 

SQ Sequence 2512 BP; 543 A; 675 C; 701 G; 593 T; 0 U; 0 Others- 



Query Match 50.2%; Score 1174.2; DB 9; Length 2512; 

Best Local Similarity 71.0%; Pred. No. 0; 

Matches 1729; Conservative 0; Mismatches 603; Indels 103; Gaps 9; 

Qy 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I I I 
Db 81 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 140 

Qy 61 GGGTCCGGCCACCAGA/yVATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 141 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 2 00 

Qy 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 201 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 260 

Qy 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 261 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 320 

Qy 241 CAGCGTCAGCCACCGC GTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGT 296 

I I I I I I I I I I I I II I I I I I I I II II 

Db 321 CAGCGTCAGGTAAGGCAGAGCCCTTGCTGCTGCTGCTCCCCCAGGAGTGCGGGGCCCGGC 380 

Qy 297 GGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCA 356 

II I II I II I I II I I II I II I I 

Db 381 GCTCACCCCTCTGCTGCCTTTCTTCACTCTTTAAGTGCCAGTCTGGGCACTTCGGGCTCC 440 

Qy 357 TCCTAGGAAGCTCAGGCTCC GGGAAAACCACGCTGCTGGACGCCATGTCC 406 

II I III I II I I I I I I I I I I I I I I 

Db 441 CTCTTTAGTGGATCGGGTGGAGAGAGGAGAGGGAGAAGGGCTGTGCTGGGAAACATGGAG 500 

Qy 407 GGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTG 4 66 

I I III I I I M I I I I I I I I I I 

Db 501 CGACAGTGAATGGCCCCTCCCCCTGCCCAGGGAAGGGCCTGGGCATAAACAAAGTGGCAG 560 

Qy 467 CGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGC 526 

I I I I I I I I I I I I I I I I I II 

Db 561 CAGTGCCCTGCCAACCCAGTGTCTACGGCCTGCCCTCTGTGGATGGGAATGGGGGTACTG 620 

Qy 527 AGCCT CACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCAT 573 

II I I I I I M I I I I I I I I 

Db 621 CGAATGCAAGGAGTCTTGAAACCTGGTGAAAGAATGCAGGGACAGCCACCTCGCAGCCAA 680 

Qy 574 CCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAG 633 

II II II I I I I I I I I I I II I I I I I I I I 

Db 681 ACGGACAGGACATTCAGAGCAACTCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCT 7 40 



Qy 634 TCTGAGCCATGTGGCAG ACCGACTGATTGGCAACTACAGCT 674 

II II II III M I III 

Db 741 CAGTCGCTATCTGCCAGGTTCTACAGAGGAGGGCGCAGAGACTGAAACACGTTAGGAGCC 800 

Qy 675 TGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCAT CG 716 

II II I I M I I I I I I I I I I 

Db 801 TGTCCGGAGACTACTGGGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGC 8 60 

Qy 717 CAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGG 776 

I I I I I M I I I MINI I II II II 

Db 861 CCCTTCCAGGGCCCCAAGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTG 920 

Qy 777 ACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAA 836 

I III I I I I I I I I I I I I I I M I III 

Db 921 AGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGAT 980 

Qy 837 TTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAG CTCTTTGAC 889 

Mill I I I I I I I M I I I Ml 

Db 981 GTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGG 1040 

Qy 890 AAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTT 949 

I I I I I I I I I I I M I I I I I I I M 

Db 1041 A7VAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGG 1100 

Qy 950 GATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCA7VACCCTTTTGACTTCTAT 1009 

I I I I II I I I Ml II Mill 

Db 1101 GAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGT TCCAGGACTGCTTCTCC 1157 

Qy 1010 ATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGA 1069 

I I I M I I I I I M I I I I I I 

Db 1158 TACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTAC 1217 

Qy 1070 GTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAAT 1129 

I I II I I II I I I I I I I II I MM 

Db 1218 ACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCC 1277 

Qy 1130 AT TGAAAGAATGAAACACCTGAAAACGTTACCAA 1163 

I I I I M II I I I I I 

Db 1278 GTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTG 1337 

Qy 1164 TGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCT7VAACTGGGTGTTCTCCTGA 1223 

I I I I II I I II I I I I I I I I I II I I 

Db 1338 GGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGAT 1397 

Qy 1224 GGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTC 128 0 

II I II I II I I II I I I II II M I I I II I I I I II I I I I II M I I I I M I I II I II I I I 
Db 139ff CCTAGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTC 1457 

Qy 1281 AGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGC7\ATGTGC 1340 

II I M I I II I I I I I II M I I I I I I I I I I II I II I I I I I II I I I I I II I II I I II I M I I I 

Db 14 58 AGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGC 1517 

Qy 1341 TAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGT 1400 

M I I II I II I I II I M I II I II I I I I I I I II II I I I M II I I I I M I I I I II M I I I I II 
Db 1518 TAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGT 1577 

Qy 14 01 ACACAGGCATGCTGAACGCTGTGTUi^TCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGG 14 60 



Db 


1578 


ACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGG 


1637 


Qy 


1461 


AGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCC 


1520 




1 [ 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 ) 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 
1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 M 1 I 1 1 1 1 1 1 1 M 1 M 1 1 M M M ) 1 M 1 M M 1 1 1 




Db 


1638 


AGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCC 


1697 


Qy 


1521 


TCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCT 


1580 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
i 1 t M 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 M M 1 1 M M M M M M 1 M M f M M M 1 M M M 




Db 


1698 


TCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCT 


1757 


Qy 


1581 


TACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAA 


1640 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 [ 1 1 1 1 t 1 I 1 
1 1 1 1 1 1 1 1 1 M .1 1 M 1 M M i 1 1 M 1 1 1 M 1 M 1 1 1 1 1 1 M 1 1 M 1 M 1 1 M 1 1 1 M 1 M 




Db 


1758 


TACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAA 


1817 


Qy 


1641 


TTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCT^CA 


1700 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 
1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 M 1 1 1 M 1 M 1 M 1 1 i 1 




Db 


1818 


TTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCAACA 


1877 


Qy 


1701 


GTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACA 


1760 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 t 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 M M 1 i M 1 1 M M M 1 M 1 1 1 1 ^ 




Db 


1878 


GTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAG7VAACA 


1937 


Qy 


1761 


TACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCA 


1820 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 
1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 i M 1 1 i M M 1 1 M M 1 M M 




Db 


1938 


TACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCA 


1997 


Qy 


1821 


GTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATG 


1880 




1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 t 1 1 1 1 1 i 1 1 
1 1 1 1 1 1 1 1 1 1 1 1 M 1 M 1 1 1 M 1 1 1 1 1 M M M 1 M 1 1 1 1 1 t M M M M M 1 1 1 M 1 1 1 




Db 


1998 


GTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATG 


2057 


Qy 


1881 


TTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAA 


1940 




1 1 1 1 1 1 1 ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 i 1 i 1 1 1 1 1 [ 1 1 1 1 1 1 t 1 1 
1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 M 1 1 M M M 1 1 1 1 1 1 1 1 1 1 M M i M t M 1 1 1 1 1 M 




Db 


2058 


TTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG7\ATTCAATTCATTGAGAAAA 


2117 


Qy 


1941 


CCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTC 


2000 




t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 t 
M 1 I 1 1 1 M 1 1 1 M 1 1 It 1 1 M 1 1 M 1 1 1 M 1 1 1 1 1 1 M 1 M 1 1 i 1 M M M 1 M 1 1 M 




Db 


2118 


CCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTC 


2177 


Qy 


2001 


CAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGT 


2060 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 t 
1 1 1 1 M 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 M 1 1 M 1 1 1 1 1 M 1 M 1 M 1 M 1 M 1 M M 1 1 1 1 1 M 




Db 


2178 


CAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGT 


2237 


Qy 


2061 


AGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGA 
1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 t t 1 t 1 1 1 1 1 1 


2120 


Db 


2238 


1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 M 1 1 M 1 i 1 M 1 M M 1 M 1 1 M 1 1 1 i 1 1 1 1 1 1 1 1 
AGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGA 


2297 


Qy 


2121 


ACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAAC 


2180 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 t 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 t 1 1 1 1 1 1 1 1 
1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 M 1 M 1 1 1 I M M 1 1 M 1 1 1 M 1 1 1 M 1 1 M 1 M 1 




Db 


2298 


ACGTCTGAT^TGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAAC 


2357 


Qy 


2181 


CATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGCAATGGAAGTGGT 


2240 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 




Db 


2358 


CATTAAGACTCCATTTGTGCCTCTTGGATCCTy^GCAGGCCTTGAATGCAATGGAAGTGGT 


2417 


Qy 


2241 


TTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGA 


2300 



I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 



2418 TTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGA 247 7 

2301 GCGGACCCAAG7\ATGTAAATAATATTCATA7\ACCT 2335 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2478 GCGGACCCAAGAATGTAAAT7\ATATTCATAAACCT 2512 



RESULT 11 
ABK51687 

ID 7VBK51687 standard; cDNA; 1069 BP. 
XX 

AC ABK51687; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE cDNA encoding hamster ABCG5 protein. 
XX 

KW Hamster; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; TVlzheimer's disease; 

KW ss . 
XX 

OS Cricetinae, 



XX 

FH Key Location/Qualifiers 

FT CDS 30. .1049 

FT /*tag= a 

FT /partial 

FT /product^ "Hamster ABCG5 protein" 

PT "This sequence lacks both a start and stop codon" 

XX 



FN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUM7\N SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96987. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 47; 66pp; English. 
XX 

CO The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CO predisposition for developing sitosterolemia, arteriosclerosis or heart 



CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the hamster ABCG5 protein of the invention. 

CC (Updated on 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 1069 BP; 266 A; 282 C; 273 G; 248 T; 0 U; 0 Other; 

Query Match 32.9%; Score 770.2; DB 6; Length 1069; 

Best Local Similarity 83.7%; Pred. No. 6.4e-203; 

Matches 896; Conservative 0; Mismatches 173; Indels 2; Gaps 2; 

Qy 368 TCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGG 427 

I I I I I I I I I I I I I I I M I I I MINI I I I I I I I I I I I I I I M I I I I I I Ml 

Db 1 TCAGGCTCAGGGAAAACCACGTTGCTGG-TGCCATCTCCGGGAGGCTGCGACGCACAGGG 59 

Qy 42 8 ACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGAC 487 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mil I II II M I III 

Db 60 ACCCTGGAAGGGGAGGTGTTTGTGAACGGCCGTGAGCTGCGCAGGGACCAGTTCCAAGAC 119 

Qy 488 TGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACG 547 

II M I M II I I I M I I II I I II I I II I I I I 11 M II Mill I M M I I I I II 

Db 120 TGCTTCTCCTATGTCCTGCAGAGCGACGTCTTTCTGAGCAGTCTCACGGTGCGAGAGACG 179 

Qy 548 CTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAG 607 

I I M I I II I I II I I I I I I II I I I I I I I II I I I I I M I I I II I I 

Db 180 CTGCGCTACACGGCGATGCTGGCCCTCCGCAGTAGCTCTTCGGACTTCTATGACAAGAAG 239 

Qy 608 GTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAAC 667 

II II I I I I I I I I I I M I I I I I I I I M I I II I I I I I I I II I I II I I I I I I II II I 

Db 240 GTAGAGGCAGTCATGGAAGAGCTAAGTCTGAGCCACGTGGCAGACCGAATGATTGGCAAC 299 

Qy 668 TACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTG 727 

II I II I I I II I I II I I I II M I M I I I I I I I I I I I I M I I I I M I I I II 

Db 300 TATAATTTTGGGGG7\ATTTCCAGTGGCGAGCGGCGCCGAGTCTCCATCGCAGCCCAACTC 359 

Qy 728 CTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCC7UVCCACAGGCCTGGACTGCATGACT 7 87 

I Mill M III I II I I I I I I II I I II I I I I I I I I I I I I I I I II I I II I II I II 
Db 360 ATTCAGGACCCCAAGATCATGATGTTTGATGAGCCAACCACAGGACTGGACTGCATGACT 419 

Qy 78 8 GCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTC 847 

I I I I II I I M M I I I I I I I I I I II I I I I I II I II I I M 

Db 420 GCAAATCAAATTGTCATCCTCCTGGCAGAGCTGGCTCGCAGGGACCGCATTGTGATCGTC 479 



Qy 

Db 



848 ACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGC 

Mill I i I II II I II I I I I I I I I I I I I I i III I I I I II II II II I M I I I I 
480 ACCATCCACCAGCCTCGCTCTGAGCTCTTTCAACACTTCGACAA7UVTTGCCATCCTGACT 



907 
539 



Qy 908 TTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAATGACTGC 967 

I II I II I I II I 11111111111111 I I I I II II M II I I II I I II I III 

Db 540 TACGGAGAGATGGTGTTCTGTGGCACGCCGGAGGAAATGCTCGACTTCTTC7\ATAGCTGT 599 

Qy 968 GGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTG 1027 

II II II I M I I II II I I I II I I I I I I 1 I I M I I II I II I I I I I I I I I I I I I I I I I 

Db 600 GGTTACCCTTGTCCTGAACATTCCAACCCCTTTGACTTCTACATGGACTTGACATCAGTG 659 

Qy 1028 GATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCT 1087 

I II II I I I I II I II M I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 660 GATACCCAGAGCAGAGAGCGAGAAATAG7WVCCTACAAGAGAGTCCAGATGCTCGAATCT 719 

Qy 1088 GCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAATGAAACAC 1147 

I I M II I II I I I I I I I I I M M I I I I I I I I I I I I I M I I I I I I I 
Db 720 GCCTTCAGAGACTCTGCAGTCTGTCACAAAATCCTGGAGAATATTGAAAGGACAAAACAC 779 

Qy 114 8 CTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTTCTCTAAA 1207 

I I I I I I I I I I I I I Ml II I I I I I M II II I I I I I I II I II M I I II II III 

Db 780 CTGAAAACCTTACCCATGATTCCTTTCAAAACGAAAGATCCTCCTGGAATGTTCTGTAAG 839 

Qy 1208 CTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAT^TAAGCTGGCAGTGATT 12 67 

II II I I I I III I I I M I I M I I I M M II I I I I I I I II I I I I I I II II I M I 

Db 840 CTGGGTGTCCTCTTGAGGAGAGTTACAAGAAACTTAATGAGAAACAAGCAGGCAGTGATC 899 

Qy 1268 ACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTC 1327 

I M M M M I I M I M I M M II M I M M I M I M M M I MM 11 M M 

Db 900 ATGCGTCTTGTTCAGAATCTCATCATGGGTCTGTTCCTCATTTTCTACCTTCTTCGGGTC 959 

Qy 1328 CGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTG 1387 

I I I I I I M I I I M I I I I II II II II I I II II I I I II II III I M 

Db 960 CAGAACGACATACTAAAGGGCGCTATCCAGGACCGTGTGGGTCTGCTATA-CAGCTGGTC 1018 

Qy 1388 GGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTG 1438 

I II M I I M I I I I I I II I I II I I I I II II II I II I I I M I M I I I II 
Db 1019 GGCGCCACCCCGTACACCGGCATGCTCAACGCTGTGAATTTGTTTCCCATG 1069 



RESULT 12 
AAD22022 

ID AAD22022 standard; DNA; 472 BP. 
XX 

AC AAD22022; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) exon 13. 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia ; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; ds . 

XX 

OS Homo sapiens . 



XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18~APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-0204234P , 
XX 

PA (TULA-) TUIxARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 73; Fig 14B; 105pp; English. 
XX 

cc 



cc 
cc 
cc 



The invention relates to an isolated Sitosterolaemia Susceptibility Gene 
CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 
cc binding cassette (ABC) family cholesterol transporter. SSG is useful for 
CC identifying a compound useful in the treatment or prevention of a sterol- 
CC related disorder, including sitosterolaemia, hyper lipidaemia, 
CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 
nutritional deficiencies. SSG is also useful for treating cholesterol- 
associated diseases or conditions including coronary heart disease and 
other cardiovascular diseases, and sitosterolaemia-associated condition 
CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 
CC expression cassette is useful in the production of transgenic non-human 
CC animals. SSG genes and their homologues are useful as tools for a number 
CC of applications including diagnosing sitosterolaemia and other 
CC cardiovascular disorders, for forensics and paternity determinations, and 
CC for treating any of a large number of SSG associated diseases. The 
CC present sequence is an exon of human SSG DNA 
XX 

SQ Sequence 472 BP; 134 A; 93 C; 100 G; 145 T; 0 U; 0 Other; 

Query Match 20.2%; Score 472; DB 6; Length 472; 

Best Local Similarity 100.0%; Pred. No. 2.7e-120; 

Matches 472; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 1869 GCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTC7UVT 1928 

I I I I I I I I M I I I I I I I I I I I I M I I I I I M M I I I I I I I I N I I I I I I I I I I I M I I I I 

1 GCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGTy^TTCAAT 60 
Qy 1929 TCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGT 1988 

I I I I I I I I M I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I 

Db 61 TCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGT 120 

Qy 198 9 ATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATC 2048 

I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I M I M I I I I I I I I I I M I I I M M I I I 

Db 121 ATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATC 180 



Qy 2049 TCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCA 2108 

I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I M I I I I I I I I 

Db 181 TCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCA 240 

Qy 2109 TGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTC 2168 

I I I I I I I M I I M M I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 241 TGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTC 300 

Qy 2169 AAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGC 2228 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I Ml I I I I I I 
Db 301 AAGTCTTTTAACCATT7y\GACTCCATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGC 360 

Qy 2229 AATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGA 228 8 

I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I M I I I 

Db 361 AATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGA 420 

Qy 228 9 AATTGTGACTGAGCGGACCCT^GAATGTAAATT^TATTCATAAACCTATGGG 2340 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 421 AATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 472 



RESULT 13 
AAC76065 

ID AAC76065 standard; cDNA; 432 BP. 
XX 

AC AAC76065; 
XX 

DT 08-FEB-2001 (first entry) 
XX 

DE Human ORFX ORF1620 polynucleotide sequence SEQ ID NO: 3239. 
XX 

KW Human; open reading frame; ORFX; detection; cytostatic; hepatotropic; 

KW vulnerary; antipsoriatic; antiparkinsonian; nootropic; neuroprotective; 

KW anticonvulsant; osteopathic; antiarthritic; immunosuppressant; cardiant; 

KW immunostimulant; thrombolytic; coagulant; vasotropic; antidiabetic; 

KW hypotensive; dermatological ; immunosuppressive; antiinflammatory; 

KW antiviral; antibacterial; antifungal; antirheumatic; antithyroid; 

KW antianaemic; gene therapy; cancer; proliferative disorder; hypertension; 

KW neurodegenerative disorder; osteoarthritis; graft vs host disease; 

KW cardiovascular disease; diabetes mellitus; hypothyroidism; SCID; AIDS; 

KW cholesterol ester storage; systemic lupus erythematosus; infection; 

KW severe combined immunodeficiency; malaria; autoimmune disorder; asthma; 

KW allergy; aplastic anaemia; nocturnal haemoglobinuria; burn; wound; 

KW bone damage; cartilage damage; antiinflammatory disease; coagulation; 

KW thrombosis; contraceptive; ss. 

XX 

OS Homo sapiens. 
XX 

PN WO200058473-A2. 
XX 

PD 05-OCT-2000. 
XX 

PF 31-M7VR-2000; 2000WO-US008621 . 
XX 

PR 31-MAR-1999; 99US-0127607P . 

PR 02-APR-1999; 99US-0127636P . 

PR 05-APR-1999; 99US-0127728P . 



PR 30-MAR-2000; 2000US-00540763 . 
XX 

PA (CURA-) CURAGEN CORP. 
XX 

PI Shimkets RA, Leach M; 
XX 

DR WPI; 2000-602362/57. 

DR P-PSDB; AAB41856. 
XX 

PT Novel nucleic acids and peptides derived from open reading frame X, 

PT useful for treating e.g. cancers, proliferative disorders, 

PT neurodegenerative disorders and cardiovascular disease. 
XX 

PS Claim 5; Page 2444; 5507pp; English. 
XX 

CC AAC74446 to AAC77605 encode the proteins given in AAB40237 to AAB43397, 

CC which represent the human ORFX open reading frames 1 to 3161. The ORFX 

CC sequences have activities such as: cytostatic; hepatotropic; vulnerary; 

CC antipsoriatic; antiparkinsonian; nootropic; neuroprotective; osteopathic; 

CC anticonvulsant; antiarthritic; immunosuppressant; immunostimulant ; 

CC cardiant; thrombolytic; coagulant; vasotropic; antidiabetic; hypotensive; 

CC dermatological; immunosuppressive; antiinflammatory; antibacterial; 

CC antiviral; antifungal; antirheumatic; antithyroid; and antianaemic. The 

CC sequences can be used for determining the presence of or predisposition 

CC to, or preventing or treating pathological conditions associated with an 

CC ORFX-associated disorder. The nucleic acids can be used to express ORFX 

CC proteins in gene therapy vectors. The proteins and nucleic acids may be 

CC used to treat cancers, proliferative disorders, neurodegenerative 

CC disorders, osteoarthritis, graft vs host disease, cardiovascular disease, 

CC diabetes mellitus, hypertension, hypothyroidism, cholesterol ester 

CC storage, systemic lupus erythematosus, severe combined immunodeficiency 

CC (SCID), AIDS, viral, bacterial or fungal infection, malaria, autoimmune 

CC disorders, asthma, allergies, aplastic anaemia, burns, wounds, bone and 

CC cartilage damage, nocturnal haemoglobinuria, antiinflammatory disease; to 

CC enhance coagulation; to inhibit thrombosis; and as a contraceptive 

XX 

SQ Sequence 432 BP; 87 A; 110 C; 118 G; 117 T; 0 U; 0 Other; 

Query Match 18.4%; Score 429.4; DB 3; Length 432; 

Best Local Similarity 99.8%; Pred. No. 1.7e-108; 

Matches 430; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

AAAACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACA 1234 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I 
AAAACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACA 60 

AGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATG 12 94 

I I M I I I I I I I I I I I I I I I M M M M I I I I I I M I I I I I I I I I I I I I I I I I I M I I I M 

AGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATG 120 

GGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATC 1354 
I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I 
GGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATC 180 

CAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTG 1414 

I M I I M I I I I I M I I M M I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I 



Qy 


1175 


Db 


1 


Qy 


1235 


Db 


61 


Qy 


1295 


Db 


121 


Qy 


1355 


Db 


181 



Qy 1415 AACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGC 1474 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I 
Db 241 7\ACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGC 300 

Qy 1475 CTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTT 1534 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 

Db 301 CTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTT 360 

Qy 1535 GTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTT 1594 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I 

Db 361 GTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTT 420 

Qy 1595 GCCCGATTTGG 1605 

I I I I I I I I II 
Db 421 GCCCGATTGGG 431 



RESULT 14 




AAZ94755 




ID 


AAZ94755 standard; cDNA; 281 BP. 




XX 






AC 


AAZ 94755 ; 




XX 






DT 


Ol-AUG-2000 (first entry) 




XX 






DE 


Human ATP binding cassette cDNA fragment 168043. 




XX 






KW 


ATP binding cassette; human; cholesterol; lipid disorder; 




KW 


atherosclerosis; lipid disorder; dyslipidemia; psoriasis; 




KW 


lupus erythematosus; diagnosis; gene therapy; ss . 




XX 






OS 


Homo sapiens. 




XX 






PN 


WO200018912-A2. 




XX 






PD 


06-APR-2000. 




XX 






PF 


21-SEP-1999; 99WO-EP006991 . 




XX 






PR 


25-SEP-1998; 98US-0101706P . 




XX 






PA 


(FARE ) BAYER AG. 




XX 






PI 


Schmitz G, Klucken J; 




XX 






DR 


WPI; 2000-293151/25. 




XX 






PT 


Adenosine triphosphate binding proteins useful for identifying agents 


for 


PT 


treating atherosclerosis and other inflammatory disorders. 




XX 






PS 


Claim 9; Page 135; 154pp; English. 




XX 






cc 


The present sequence is that of human ATP binding cassette (ABC) cDNA 




cc 


fragment 168043, identified as a cholesterol-sensitive gene fragment. 


The 


cc 


invention provides cholesterol-sensitive ABC genes (see AAZ94734-63) . 




cc 


These genes, and polypeptides encoded by them, can be used for diagnostic 



CC and therapeutic applications, and for biochemical or cell-based assays to 

CC screen for pharmacologically active modulator compounds useful for the 

CC treatment of lipid disorders, atherosclerosis or other inflammatory 

CC diseases such as psoriasis and lupus erythematosus 
XX 

SQ Sequence 281 BP; 60 A; 68 C; 73 G; 80 T; 0 U; 0 Other; 

Query Match 11.5%; Score 268; DB 3; Length 281; 

Best Local Similarity 99.6%; Pred. No. 9.3e-64; 

Matches 279; Conservative 0; Mismatches 0; Indels 1; Gaps 1; 

AAAACCAAAGATTCTCCTGGAGTTTTCTCTTW^CTGGGTGTTCTCCTGAGGAGAG-TGAC 1233 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
7W\ACCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTTGAC 60 

AAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCAT 1293 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I 
AAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCAT 120 

GGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTAT 1353 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
GGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTAT 18 0 

CCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCT 1413 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M I I I I I I I I I I I I 
CCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCT 240 

GAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGC 1453 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
GAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGC 2 80 



Qy 


1175 


Db 


1 


Qy 


1234 


Db 


61 


Qy 


1294 


Db 


121 


Qy 


1354 


Db 


181 


Qy 


1414 


Db 


241 



RESULT 15 


ABK51683 


ID 


ABK51683 standard; DNA; 5460 BP. 


XX 




AC 


ABK51683; 


XX 




DT 


30-JUL-2002 (first entry) 


XX 




DE 


Human ABCG5 upstream genomic sequence, exon 1, intron 1 and exon 2. 


XX 




KW 


Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 


KW 


arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 


KW 


chromosome 2p21; ds . 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200227016-A2. 


XX 




PD 


04-APR-2002. 


XX 




PF 


25-SEP-2001; 2001WO-US029859 . 


XX 




PR 


25-SEP-2000; 2000US-0235268P . 


XX 




PA 


(USSH ) US DEPT HEALTH & HUMAN SERVICES. 



PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 38-41; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (MCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of 7VBCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or TVlzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the upstream genomic sequence, exon 1, intron 1 

CC and exon 2 of the human ABCG5 gene located on chromosome 2p21 



SQ Sequence 5460 BP; 1351 A; 1350 C; 1508 G; 1243 T; 0 U; 8 Other; 

Query Match 10.7%; Score 249.6; DB 6; Length 5460; 

Best Local Similarity 98.4%; Pred. No. 7.2e-58; 

Matches 252; Conservative 0; Mismatches 4; Indels 0; Gaps 0; 



Qy 


1 


GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 


60 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 




Db 


4504 


GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 


4563 


Qy 


61 


GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 


120 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 




Db 


4564 


GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 


4623 


Qy 


121 


ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 


180 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


4624 


ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 


4683 


Qy 


181 


GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 




1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


4684 


GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 


4743 



Qy 241 CAGCGTCAGCCACCGC 256 

I I I I I I I I I I II 
Db 4744 CAGCGTCAGGT7\AGGC 4759 



Search completed: February 26, 2004, 01:19:55 
Job time : 599.658 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched: 



February 26, 2004, 00:48:03 ; Search time 113.204 Seconds 

(without alignments) 
11471-161 Million cell updates/sec 

US-09-989-981A-5 
2340 

1 gtcaggtggagcaggcaggg aatattcataaacctatggg 2340 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



1365418 



682709 seqs, 277475446 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_IsrA: * 

1 : /cgn2_6/ptodata/2/ina/5A_COMB . seq: * 

2 : /cgn2_6/ptodata/2/ina/5B_COMB. seq: * 

3 : /cgn2_6/ptodata/2/ina/6A_COMB. seq: * 

4 : /cgn2_6/ptodata/2/ina/6B_COMB. seq:* 

5 : /cgn2_6/ptodata/2/ina/PCTUS_COMB . seq: * 

6 : /cgn2_6/ptodata/2/ina/backf ilesl . seq: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 
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No. 


Score 


Query 

Match Length DB 


ID 
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4.1 


235 
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US-09-172-108-8 
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83 


3.5 


3376 
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US-09-620-312D-918 
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3.1 


4159 
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US-09-614-912-139 


5 


61,4 


2.6 
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4 


US-09-489-039A-2869 


6 
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2.6 
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4 


US-09-48 9-039A-4920 


7 


59.4 


2.5 
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4 


US-09-489-039A-3218 


8 


57 
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4 


US-09-252-991A-13705 


9 
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US-09-103-840A-2 
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56.8 


2.4 


4411529 


3 


US-09-103-840A-1 


11 


56.2 


2.4 


972 


4 


US-09-252~991A-9848 



Description 



Sequence 2, Appli 
Sequence 8, Appli 
Sequence 918, App 
Sequence 139, App 
Sequence 2869, Ap 
Sequence 4920, Ap 
Sequence 3218, Ap 
Sequence 13705, A 
Sequence 2, Appli 
Sequence 1, Appli 
Sequence 9848, Ap 
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1713 
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us- 


09-252- 


991A-9760 


Sequence 


9760, Ap 


c 


13 
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.4 
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4 


US- 


09-252- 


991A-10208 


Sequence 


10208, A 


c 


14 


55.6 
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.4 


1509 


4 


US- 


09-252- 


991A-13436 


Sequence 


13436, A 




15 


54.2 
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.3 
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4 


us- 


09-252- 


991A-15851 


Sequence 


15851, A 




16 


54 


2 
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US- 


09-252- 


991A-13657 


Sequence 


13657, A 
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54 
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.3 
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us- 


09-252- 


991A-13575 


Sequence 


13575, A 


c 
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53.4 
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.3 
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4 


us- 


09-489- 


039A-4877 


Sequence 


4877, Ap 
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53.4 


2 


.3 
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4 


us- 


09-489- 


039A-4894 


Sequence 


4894, Ap 
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53.4 
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,3 
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4 


us- 


09-252- 


991A-10998 


Sequence 


10998, A 




21 


53.4 


2 


.3 


1335 


4 


us- 


09-252- 


991A-10934 


Sequence 


10934, A 


c 


22 


53.4 


2 


.3 


2178 


4 


us- 


09-252- 


991A-11254 


Sequence 


11254, A 


c 
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53 
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.3 
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3 


us 


-09-103 


-84 0A-2 


Sequence 


2, Appli 
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53 
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.3 


4411529 


3 


us 


-09-103 


-840A-1 


Sequence 


1, Appli 


c 


25 
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2 


.2 
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4 


us- 


09-489- 


039A-6200 


Sequence 


6200, Ap 


c 


26 


52.6 


2 


.2 


1800 


4 


us- 


09-489- 


039A-5597 


Sequence 


5597, Ap 




27 
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.2 
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4 


us- 


09-489- 


039A-5579 


Sequence 


5579, Ap 


c 
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52 
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.2 
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4 


us- 


09-489- 


039A-6117 


Sequence 


6117, Ap 
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.2 
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us- 


09-489- 


039A-5958 


Sequence 


5958, Ap 


c 
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.2 
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us- 


09-252- 


991A-3138 


Sequence 


3138, Ap 




31 


51.6 
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.2 


1476 


4 


us- 


09-252- 


991A-2825 


Sequence 


2825, Ap 




32 


51 
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.2 
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4 


us- 


09-252- 


991A-8027 


Sequence 


8027, Ap 


c 


33 


51 
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.2 
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us- 


09-252- 


991A-8287 


Sequence 


8287, Ap 
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51 
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.2 


1209 


4 


us- 


09-489- 


039A-6373 


Sequence 


6373, Ap 
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us- 


09-252- 


991A-16348 


Sequence 


16348, A 
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.2 
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4 


us- 


09-252- 


991A-16477 


Sequence 


16477, A 


c 
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.2 
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4 


us- 


09-252- 


991A-16015 


Sequence 


16015, A 


c 


38 


50.6 
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.2 
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4 


us- 


09-252- 


991A-15936 


Sequence 


15936, A 




39 


50.6 
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.2 


2427 


4 


us- 


09-252- 


991A-16255 


Sequence 


16255, A 




40 


50.2 


2 


.1 
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us- 


09-540- 


236-1011 


Sequence 


1011, Ap 




41 


50.2 


2 


.1 


269223 


4 


us- 


09-596- 


002-41 


Sequence 


41, Appl 




42 


50 


2 


.1 


744 


4 


us- 


09-252- 


991A-13301 


Sequence 


13301, A 




43 


50 


2 


.1 


783 


4 


us- 


09-252- 


991A-12981 


Sequence 


12981, A 


c 


44 


50 


2 


.1 


786 


4 


us- 


09-252- 


991A-12498 


Sequence 


12498, A 




45 


50 


2 


.1 


987 


4 


us- 


09-489- 


039A-2741 


Sequence 


2741, Ap 



ALIGNMENTS 



RESULT 1 
US-09-245-808-2 

; Sequence 2, Application US/09245808 

; Patent No. 6313277 

; GENERAL INFORMATION: 

; APPLICANT: Doyle, L. Austin 

; APPLICANT: Abruzzo, Lynne V. 

; APPLICANT: Ross, Douglas D. 

; TITLE OF INVENTION: Breast Cancer Resistance Protein (BCRP) and DNA which 
; TITLE OF INVENTION: encodes it 

FILE REFERENCE: Ross UMb conversion 
; CURRENT APPLICATION NUMBER: US/09/245, 808 
; CURRENT FILING DATE: 1999-02-05 
; EARLIER APPLICATION NUMBER: 60/073763 
; EARLIER FILING DATE: 1998-02-05 
; NUMBER OF SEQ ID NOS : 7 

SOFTWARE: PatentlnVer. 2.0 
; SEQ ID NO 2 



LENGTH; 2418 
TYPE: DNA 

ORGANISM: Human MCF-7/AdrVp cells 
US-09-245-808-2 

Query Match 4.2%; Score 98; DB 4; Length 2418; 

Best Local Similarity 50.2%; Pred. No. 4e-17; 

Matches 242; Conservative 0; Mismatches 240; Indels 0; Gaps 0; 

Qy 617 GTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTG 67 6 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 

Db 72 8 GTCATTCAAGAGTTAGGTCTGGATATyVGTGGCAGACTCCAAGGTTGGAACTCAGTTTATC 787 

Qy 677 GGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGAT 736 

I I I I I I I I I I I II I I I I I I I I I I III 

Db 788 CGTGGTGTGTCTGGAGGAGAAAGAAAAAGGACTAGTATAGGAATGGAGCTTATCACTGAT 847 

Qy 737 CCT7VAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAG 796 

III I I I I I I I I I I I II I I I I I I IN I I I M I I I 

Db 848 CCTTCCATCTTGTTCTTGGATGAGCCTACAACTGGCTTAGACTCAAGCACAGCAAATGCT 907 

Qy 797 ATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCAC 856 

I I I I M I I I I I I I I I II I I I I I I II I I I I 

Db 908 GTCCTTTTGCTCCTGA?AAGGATGTCTAAGCAGGGACGAACAATCATCTTCTCCATTCAT 967 

Qy 857 CAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAG 916 

I I I I I I I I I III I I I I I I I I I I III II II III 

Db 968 CAGCCTCGATATTCCATCTTCAAGTTGTTTGATAGCCTCACCTTATTGGCCTCAGGAAGA 1027 

Qy 917 CTGATTTTCTGTGGCACGCCAGCGG7UUVTGCTTGATTTCTTCAATGACTGCGGTTACCCT 97 6 

I I I I Ill II I I I I I I I I I I I 

Db 1028 CTTATGTTCCACGGGCCTGCTCAGGAGGCCTTGGGATACTTTGAATCAGCTGGTTATCAC 1087 

Qy 977 TGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGACGTCAGTGGATACCCAA 1036 

III I II I I I I I I I I I I I I I I I II I I I I I I I I I 

Db 1088 TGTGAGGCCTATAATAACCCTGCAGACTTCTTCTTGGACATCATTAATGGAGATTCCACT 1147 

Qy 1037 AGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGATAGAATCTGCCTACAAG 1096 

II I I I I I III I I I I I I I 

Db 1148 GCTGTGGCATTAAACAGAGAAG7VAGACTTTAAAGCCACAGAGATCATAGAGCCTTCCAAG 1207 

Qy 1097 AA 1098 

I 

Db 1208 OA 1209 



RESULT 2 
US-09-172-108-8 

; Sequence 8, Application US/09172108 

; Patent No. 6160104 

; GENER7VL INFORMATION: 

; APPLICANT: Cunnigham, Mary Jane 

; APPLICANT: Zweiger, Gary B. 

; APPLICANT: Panzer, Scott R. 

; APPLICANT: Seilhamer, Jeffrey J. 

; TITLE OF INVENTION: MARKERS FOR PEROXISOMAL PROLIFERATORS 
; FILE REFERENCE: PA-0 012 US 



; CURRENT APPLICATION NUMBER: US/09/172 , 108 
; CURRENT FILING DATE: 1998-10-13 
; NUMBER OF SEQ ID NOS : 56 

SOFTWARE: PERL Program 
; SEQ ID NO 8 
; LENGTH: 235 
; TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: - 

OTHER INFORMATION: 700138117H1 
US-09-172-108-8 



Query Match 4.1%; Score 96.6; DB 3; Length 235; 

Best Local Similarity 68.9%; Pred. No, 2.5e-17; 

Matches 162; Conservative 0; Mismatches 69; Indels 4; Gaps 2; 

Qy 7 5 GA7UUVTTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCG 134 

II III III I M I II I I I I I I I II M M I I M I I I II I 111 
Db 1 GAGGATTCACTCACATTTGCTTCCCGCTGGCCATGAGTGAGCTGCCCTTTCTGAGTCCAG 60 

Qy 135 GAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTG 194 

II III III II I I I I I I I I II I I I I I I II I I I II I I I I II 

Db 61 AGGGAGCCAGAGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAG 120 

Qy 195 CCACCGCCCCGGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCC 251 

I I I I I I I I I I I I I I I I I II II II I II II II I M I I I I I I 
Db 121 TTACAGGCTCAGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAG-A 179 

Qy 252 ACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCA 306 

I I I I M I I I II I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I 
Db 180 ACCGTGTCGGGCCCTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGAA 234 



RESULT 3 

US-09-620-312D-918 

Sequence 918, Application US/09620312D 
Patent No. 6569662 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLIC7\NT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Tang, Y. Tom 
Liu, Chenghua 
Asundi, Vinod 
Zhang, Jie 
Ren, Feiyan 
Chen, Rui-hong 
Zhao, Qing A. 
Wehrman, Tom 
Xue, Aidong J. 
Yang, Yonghong 
Wang, Jian-Rui 
Zhou, Ping 
Ma, Yunqing 
Wang, Dunrui 
Wang, Zhiwei 
John Tillinghast 
Drmanac, Radoje T. 



TITLE OF INVENTION: 
TITLE OF INVENTION: 



No. 6569662el Nucleic Acids and 
Polypeptides 



; FILE REFERENCE: 784CIP2B 

; CURRENT APPLICATION NUMBER: US/09/620 , 312D 

; CURRENT FILING DATE: 2000-07-19 

; PRIOR APPLICATION NUMBER: 09/552,317 

; PRIOR FILING DATE: 2000-04-25 

; PRIOR APPLICATION NUMBER: 09/488,725 

; PRIOR FILING DATE: 2000-01-21 

; NUMBER OF SEQ ID NOS : 1105 

SOFTWARE: pt_FL_genes Version 1.0 
; SEQ ID NO 918 

LENGTH: 337 6 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 

NAME /KEY: CDS 
; LOCATION: (1)..(2808) 
US-09-620-312D-918 



Query Match 3.5%; Score 83; DB 4; Length 3376; 

Best Local Similarity 49.6%; Pred. No. 7.9e-13; 

Matches 289; Conservative 0; Mismatches 270; Indels 24; Gaps 2; 
Qy 309 TCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCT 368 

I I I I I I MM I I M M I I I M M M I I M M 

Db 68 TTCTCAAGTGCCTCTCAGGTAAATTCTGCCGCCGGGAGCTGATTGGCATCATGGGCCCCT 127 

Qy 369 CAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGA 428 

MM I M M Ml I MM I M I M Ml I II 

Db 128 CAGGGGCTGGCAAGTCTACATTCATGAACATCTTGGCAGGATACAGGGAGTCTGG7\ATGA 187 

Qy 429 CCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACT 488 

M M I I I II II II Mill II I MM I 

Db 188 AGGGGCAGATCCTGGTTAATGGAAGGCCACGGGAGCTGAGGACCTTCCGCAAGA 241 

Qy 489 GCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGC 548 

I II M II M I II I MM MUM II M II M I M I 

Db 242 TGTCCTGCTACATCATGCAAGATGACATGCTGCTGCCGCACCTCACGGTGTTGGAAGCCA 301 

Qy 549 TGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGG 608 

II MM III II II II II I 

Db 302 TGATGGTCTCTGCTAACCTGAAGCTGAGTGAGA AGCAGGAGG 343 

Qy 609 TGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGC7UVCT 668 

II II I III II II I I II I I Ml 

Db 344 TGAAGAAGGAGCTGGTGACAGAGATCCTGACGGCACTGGGCCTGATGTCGTGCTCCCACA 403 

Qy 669 ACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGC 72 8 

M I III M II I I II I II II II I I I I I I 

Db 404 CGAGGACAGCCCTGCTCTCTGGCGGGCAGAGGAAGCGTCTGGCCATCGCCCTGGAGCTGG 463 

Qy 729 TCCAGGATCCT7UVGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTG 788 

Ml III II I I M II I I II I II II I II II II I I I II II II 

Db 464 TCAACAACCCGCCTGTCATGTTCTTTGATGAGCCCACCAGTGGTCTGGATAGCGCCTCTT 523 

Qy 789 CTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCA 84 8 

I M Ml II II II I MUM II I I I I I I II 

Db 524 GTTTCCAAGTGGTGTCCCTCATGAAGTCCCTGGCACAGGGGGGCCGTACCATCATCTGCA 583 



Qy 849 CCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAA 8 91 

I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 584 CCATCCACCAGCCCAGTGCCAAGCTCTTTGAGATGTTTGACAA 626 



RESULT 4 

US-09-614-912-139 

Sequence 139, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/614,912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 204 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 139 
LENGTH: 4159 
TYPE: DNA 

ORGANISM: Oryza sativa 
US-09-614-912-139 

Query Match 3.1%; Score 73.2; DB 4; Length 4159; 

Best Local Similarity 50.5%; Pred. No. 5.1e-10; 

Matches 205; Conservative 0; Mismatches 198; Indels 3; Gaps 1; 

Qy 647 GCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGG 706 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I II 

Db 414 GCGGACACGATCGTCGGCGACCAGATGCAGAGGGGGATCTCCGGTGGTCAGAAGAAACGC 473 

Qy 707 GTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACC 766 

I M I I I I I I II II I M I II I I I I I I I II II 

Db 474 GTCACCACCGGTGAGATGATTGTCGGTCCAACAAAGGTTCTATTCATGGATGAGATATCA 533 



Qy 767 ACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGC 826 

I I I I I I I I I I I II I II M I II I I III I I I I I II 

Db 534 ACTGGATTGGACAGCTCCACCACATTCCAGATTGTCAAATGCCTTCAGCAAATCGTGCAC 593 

Qy 827 AGGAACCGAATTGTGGTTCTCA CCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTC 883 

I I I I I I I I II I I II I I I I I I I I I I II I 

Db 594 TTGGGCGAGGCAACCATCCTCATGTCACTCCTACAACCAGCCCCTGAGACTTTTGAGCTA 653 

Qy 884 TTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAA 943 

I I I I MM I I I I I II III III II II I I 

Db 654 TTCGATGACATTATCCTACTGTCAGAAGGCCAGATTGTTTATCAGGGACCCCGCGAATAC 713 

Qy 944 ATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGAC 1003 

I I I I I I I I I I I I I I I I I I I I M I I I I I I II III 

Db • 714 GTCCTTGAGTTCTTTGAGTCATGCGGATTCCGCTGCCCAGAGCGTAAGGGTACTGCAGAC 773 

Qy 1004 TTCTATATGGACCTGACGTCAGTGGATACCCA7\AGCAAGGAACGGG 1049 

M I I M I II I I I I I I II MINI 

Db 774 TTTCTTCAGGAGGTGACATCAAAGAAGGATCAGGAGCAGTATTGGG 819 



RESULT 5 

US-09-4 89-039A-2869 

Sequence 2869, Application US/09489039A 
Patent No. 6610836 
GENERAL INFORMATION: 
APPLICANT: Gary Breton et . al 

TITLE OF INVENTION: NUCLEIC ACID AND T^INO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 2709.2004001 
CURRENT APPLICATION NUMBER: US/09/489, 039A 
CURRENT FILING DATE: 2000-01-27 
PRIOR APPLICATION NUMBER: US 60/117,747 
PRIOR FILING DATE: 1999-01-29 
NUMBER OF SEQ ID NOS : 14342 
SEQ ID NO 2869 
LENGTH: 1551 
TYPE: DNA 

ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-2869 

Query Match 2.6%; Score 61.4; DB 4; Length 1551; 

Best Local Similarity 48.3%; Pred. No. 5.8e-07; 

Matches 272; Conservative 0; Mismatches 276; Indels 15; Gaps 3; 

Qy 306 AGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAA 365 

I I I I I II M II I I I I I I I I I I I I M I I I II M I I I I I I 

Db 851 AGGTGCTGAAAGGCATCGATCTGCAGGTGGAGAACGGGGAGGTGATCAGCATCATCGGCC 910 

Qy 366 GCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGG AGGCTGGGGC 419 

I I I I I I I I I I I I I I I I I I II I I I I I II II Mill 

Db 911 CGTCCGGCTCCGGCAAT^CCACCCTGATCCGCACCATCAACGCCCTCGAAAGCCTTGATG 970 

Qy 420 GCGCGGGGACCTTCCTG GGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGC 47 6 

II I II I I I I I I MUM I M I I I II I I I I I I II 

Db 971 GCGGGGAGATCATTCTCTACGGCGAGGACTATCTT7WVGGGCGGAGCCATCGTCGACAAAC 1030 



Qy 


477 


AGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCG 
III II t 1 1 II 1 Mill 1 


536 


Db 


1031 


III II 1 1 1 II 1 1 1 1 1 1 t 
GCCAGATGCGCGCCGGGGTACGGCGCATCGGCATGGTCTTCCAGAGCTTCAACCTGTTCC 


1090 


Qy 


537 


TGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCT 

I 1 III III 1 III II i 1 1 t 1 t 1 t 1 i II 1 1 1 1 

II 1 II II i 1 1 II 1 1 1 1 1 II 1 II 1 1 II 1 II 1 

CCCACCGCACGGTGCTCGACAACGTGATGCTGGCCCCGC GCTATCACCAGCTGC 


596 


Db 


1091 


1144 


Qy 


597 


TCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGAC 
1 1 II till II 111 1 1 II 11 III 1 

1 1 M 1 1 1 1 II III II II 1 1 II 1 1 

TGGACCAGCCGGTCGCCCGCGAGCAGGCCCTGGCGCTGCTCGACCGCGTCGGCCTGCTGG 


656 


Db 


1145 


1204 


Qy 


657 


TGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCG 

I 1 1 1 1 1 1 II 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

II 1 II 1 1 11 1 1 1 1 II 1 1 1 1 1 II 1 1 1 II 1 1 

CCCATGCCCACAAGTACCCCGGACAGCTCTCCGGCGGCCAGCAGCAGCGGGTGGCGATCG 


716 


Db 


1205 


1264 


Qy 


717 


CAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACGACAGGCCTGG 

1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 11 III 1 1 1 1 1 1 
1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 1 I 1 1 t 1 1 1 1 1 1 1 1 1 

CCCGGGCGCTGGCGCTGAAGCCGGACATTATGCTGTTTGACGAACGGACCTCGGCGCTGG 


776 


Db 


1265 


1324 


Qy 


111 


ACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAA 

1 1 1 III III 1 II 1 II 1 1 1 1 1 
ATCCGGAGCTGGTAGGCGAAGTGCTGAAGGTCATTCAGTCGCTGGCCCGCGAAGGCATGA 


836 


Db 


1325 


1384 


Qy 


837 


TTGTGGTTCTCACCATTCACCAG 859 

III 1 1 1 1 1 1 1 M 

CCATGCTGATTGTCACTCACGAG 1407 




Db 


1385 





RESULT 6 

US-09-489-039A-4 920 

; Sequence 4920, Application US/09489039A 

; Patent No, 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 2709.2004001 
; CURRENT APPLICATION NUMBER: US/09/489, 039A 
; CURRENT FILING DATE: 2000-01-27 
; PRIOR APPLICATION NUMBER: US 60/117,747 
; PRIOR FILING DATE: 1999-01-29 
; NUMBER OF SEQ ID NOS : 14342 
; SEQ ID NO 4920 
LENGTH: 1722 
; TYPE: DNA 

; ORGANISM: Klebsiella pneumoniae 
US-09-4 89-039A-4920 

Query Match 2.6%; Score 61.2; DB 4; Length 1722; 

Best Local Similarity 46.4%; Pred. No. 7e-07; 

Matches 284; Conservative 0; Mismatches 313; Indels 15; Gaps 2 

Qy 270 GGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGT 32 9 

II I I II II I II I III III I II I I I I I I I I II 



Db 



1025 GGGAGGTCACTTTCCGCTATCCTCAGCAGCCCTCCCCTGCCCTGGAGAATATTTCCCTGC 1084 



Qy 330 ACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGC 389 

I II III II III I II I I I I I I I I II I I I II I I M I I 

Db 1085 AGATTGCCGCCGGAGAGCACATCGCCATTCTTGGCCGGACCGGCTGCGGAAAATCGACGC 1144 

Qy 390 TGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATG 449 

II I I I III I I I I I I I I I II III I I M 

Db 1145 TGTTGCAGTTGCTTACCC GCGCCTGGGACCCGTCACAGGGAGAGATTCTG 1194 

Qy 450 TGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGA 509 

I II I II I I III I II I I I II 
Db 1195 CTCAACAATCAGCCGCTCTCCGGCCTCAGCGAAGCCACTCTTCGGCAGGC AATGA 1249 

Qy 510 GCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGG 569 

III I I I I I I II I III I I I I I I I I I I I II I I I I M 

Db 1250 GCGTAGTGCCGCAGCGCGTGCACCTGTTCAGCGCCACCCTGCGCGACAACCTGCTGCTGG 1309 

Qy 570 CCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGC 629 

I I I I I I I II I I I I I I I II 

Db 1310 CGGCGCCTGAAGCGGATGACGCTCATCTCAGCGCTACCCTTGAGAAGGTGGGCCTCG7WV 1369 

Qy 630 TGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCA 689 

III I I I I I I II I I I I I I 

Db 137 0 AACTGCTGCAAGATGGTGGTCTTAACGGCTGGCTGGGCGAAGGCGGGCGTCAGCTCTCCG 142 9 

Qy 690 CGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGC 749 

I I I I I I I M M I I I I I I I I I I I M M I I I I I I I I II I I 
Db 1430 GCGGCGAACTGCGCCGACTGGCCATTGCCCGCGCGCTGCTCCATGATGCGCCGCTGATGC 148 9 

Qy 750 TGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCC 809 

M I I I I M I I I I I I I I I II I III I II II I I III 

Db 14 90 TGCTCGATGAACCGACAGAAGGTCTGGATGCGGCCACCGAAAGCCAGATCCTGCATCTAC 154 9 

Qy 810 TGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTG 869 

I I I I I I III II I I I I I I I I I I I I II I 

Db 1550 TGGCAGATGTCATGCGCGACAAAACCGTGCTGATGGTGACCCATCGCCTGCGGGACCTGG 1609 

Qy 87 0 AGCTTTTTCAGC 881 

I I I I I I I 
Db 1610 CGGGTTTTAATC 1621 



RESULT 7 

US-09-489-039A-3218 

Sequence 3218, Application US/09489039A 
Patent No. 6610836 
GENERAL INFORMATION: 
APPLICANT: Gary Breton et. al 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 2709.2004001 
CURRENT APPLICATION NUMBER: US/09/4 89, 039A 
CURRENT FILING DATE: 2000-01-27 
PRIOR APPLICATION NUMBER: US 60/117,747 
PRIOR FILING DATE: 1999-01-29 



; NUMBER OF SEQ ID NOS: 14342 
; SEQ ID NO 3218 
; LENGTH: 765 
; TYPE: DNA 

; ORGANISM: Klebsiella pneumoniae 
US-09-4 8 9-039A-3218 

Query Match 2.5%; Score 59.4; DB 4; Length 765; 

Best Local Similarity 45.4%; Pred. No. 1.4e-06; 

Matches 262; Conservative 0; Mismatches 306; Indels 9; Gaps 1; 

Qy 286 CCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCA 345 

I I I I I II II II I I I I I I I I I I I I II III I I II I 

Db 30 CTGGAAGGCAGGCAAAAAGGTCATCGTCAATAATGTCTCGCTGCGGGTGCCGCGAGGCGA 89 

Qy 34 6 GATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTC 405 

I I I I I I I I I I II I II I I I I I I I I II II I I II III 
Db 90 AACGGTCGGACTGCTGGGGCCCAACGGCTGCGGCAAATCCTCGCTGCTGCGCGTTCTGGC 149 

Qy 4 06 CGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCT 465 

I I II I I I I II I III II 

Db 150 GGGCCTGCGCCGCCCGGATGCAGGTCGCGTCACCCTCGACGGCCAGGATATCGCCCGGAT 209 

Qy 466 GCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAG 525 

I I I I I I I I I I I I I I I I I I I I I II I I I 

Db 210 GGCGAAAAAGCAGCTCGCCCGCCGCGTGGCTTTCGTCGAGCAACACGGCATGACCGAGGC 269 

Qy 526 CAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAA 585 

II I I I I II I I I I I I I III I II I I I I I 

Db 270 CAATATGCGGGTGCGCGACGTCGTGCGC CTGGGACGCATTCCCCACCACTC 320 

Qy 586 TCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGT 645 

III III III I I I I I I I I III III 

Db 321 TCCGTTCTCAAACTGGAGCGCTCAGGATGACGAGGCGATTGCCGCCGCGCTGCAGCGGGT 380 

Qy 646 GGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCG 705 

II I II III I III I I I I I I I I I I I II I I II 

Db 381 AGCGATGCTGGAGAAAAGCGAACAGGGATGGTTAAGCCTCTCCGGCGGCGAGCGGCAGCG 440 

Qy 7 06 GGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAAC 765 

III I I I I I I I I I I III III I II I I I I I I I I I M I I I 

Db 441 GGTGCATATCGCCCGCGCGCTGGCGCAGAGCCCGAGCGAAATCCTGCTGGATGAGCCGAC 500 

Qy 766 CACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCG 825 

II I I I I I I I I I I I I II I I I I I I 

Db 501 CAACCATCTGGATATACACCATCAGATGCAGTTAATGCAGTTGATCAGCGAGCTGCCGGT 560 

Qy 826 CAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCC 862 

I I I II I I I I II I I I I II 

Db 561 AACCAGCATTGTGGCCATTCACGATCTTAACCATGCC 597 



RESULT 8 

US-09-252-991A-13705 

; Sequence 13705, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 



; APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; ^ CURRENT APPLICATION NUMBER: US/09/2 52 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 13705 

LENGTH: 1668 

TYPE: DNA 
; 0RG7\NISM: Pseudomonas aeruginosa 
US-09-252-991A-13705 

Query Match 2.4%; Score 57; DB 4; Length 1668; 

Best Local Similarity 45.7%; Pred. No. le-05; 

Matches 238; Conservative 0; Mismatches 2 80; Indels 3; Gaps 1; 

Qy 312 TC7\AAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAG 371 

I I I I III II I I I I I I I I II I I I II I II I I I I 

Db 959 TCGACGGGGTCAATTTCGAACTACCCCGCGGGCAGACGCTGGGCATCGTTGGCGAAAGCG 1018 

Qy 372 GCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCT 431 

I I I I II I I I I I I I I I I I II I I I I I II III I 

Db 1019 GCTCGGGCAAGTCGACCCTTGGCCTGGCAATCCTGCGGCTGCTGGAAAGCCAGGGCGGCA 1078 

Qy 432 TCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCT 491 

III III I I I I I I I I I I I I I II 

Db 1079 TCCGCTTCGAAGGCACCCGGCTGGACGGTCTCGCGCAACATGACGTGCGCCCGCTGCGCC 1138 

Qy 492 TCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGC 551 

I II I I I II I I I I I I I I I I I I I I I 

Db 1139 GCGAGATGCAGGTGGTGTTCCAGGACCCATATGGCAGCCTCAGCCCACGCATGTGTGTCG 1198 

Qy 552 ACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGG 611 

I I I I I I I I I I I I I Mill I II I I 

Db 1199 GCGAGATCGTCGGCGAAGGCCTGCGCATCCATAGGATCGGCAGCGAGGCCGAACAGGAGC 1258 

Qy 612 AGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACA 671 

I I I I I I I I I I I I I I II II III III I I I I 

Db 1259 AGGCGATCATCGACGCGCTG GTGGAGGTCGGGCTCGATCCGCAGACCCGCTACCGTT 1315 

Qy 672 GCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCC 731 

I I I I I I II I I I I I II I I I I I I I I I I II I I 

Db 1316 ACCCCCACGAATTCTCCGGCGGCCAGCGCCAGCGCATCGCCATCGCCCGGGCACTGGTGC 1375 

Qy 732 AGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTA 791 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 1376 TGAAACCGGCACTGATCCTGCTCGACGAACCGACCTCGGCGCTCGACCGCACCGTGCAGC 1435 

Qy 7 92 ATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGTVAC 832 

I I I I M I I I I I I I I I I I I I I I I I 

Db 1436 GCCAGGTCGTGGAATTGCTGCGGC7\ACTGCAGGGCAAGTAC 1476 



RESULT 9 

US-09-103-840A-2 

; Sequence 2, Application US/09103840A 

; Patent No, 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: ERASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 
; TITLE OF INVENTION: TUBERCULOSIS 
; FILE REFERENCE: 24366-20007.00 
; CURRENT APPLICATION NUMBER: US/09/103 , 84 OA 
; CURRENT FILING DATE: 1998-06-24 
; NUMBER OF SEQ ID NOS : 2 
; SOFTWARE: Patentin Ver, 2.1 
; SEQ ID NO 2 
; LENGTH: 4403765 
TYPE: DNA 

; 0RG7\NISM: Mycobacterium tuberculosis 
FEATURE : 

OTHER INFORMATION: CDC 1551 

OTHER INFORMATION: "n" bases at various positions throughout the sequence 
OTHER INFORMATION: represent a, t, c or g 
US-09-103-840A-2 



Query Match 2.4%; Score 56.8; DB 3; Length 4403765; 

Best Local Similarity 46.0%; Pred. No. 0.0012; 

Matches 273; Conservative 0; Mismatches 312; Indels 9; Gaps 2; 

Qy 271 GGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTA 330 

I I I I I I I I I I II I I I I I I I I I I III 

Db 2879474 GGTCGTCGAGTATTCCAGCGGCGGGTACGCCGTGCGGCCGATCGACGGGTTAAGCCTCGA 
2879533 



Qy 331 CGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGG7WVACCACGCT 390 

I I II I I III II II I I I I II I I I M I I I I I I I I II M 
Db 2879534 CGTGGCGCCGGGGTCGCTGGTGATCTTGCTTGGGCCCAGCGGCTGCGGGAAGACGACCCT 
2879593 



Qy 391 GCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGT 450 

II I I I II I I I I I I I I II I I I I I I I 

Db 2 87 9594 CTTGTCCTGCCTCGGCGGCATCCTGCGCCCGAAGTCCGGCTCAATCAAGTTTGACGATGT 
2879653 



Qy 451 GAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAG 510 

I I I I I I I I I II II III II I II 

Db 2879654 CGACATCACGACGCTGGAGGGCGCCGCGCTGGCGAAGTATCGGCGTGACAAGGTAGGGAT 
2879713 



Qy 511 CGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGC 570 

III III I II I I I I III II I I I I I I I I I I 

Db 2879714 CGTCTTCCAGGCGTTCAACCTGGTCTCGAGCCTTACCGCCCTGGAGAACGTGATGGTCCC 
2879773 



Qy 571 CATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCT 630 

I I I I II II II I I II I I II III 

Db 2879774 GCTGCGCGCGGCCGG CGTGTCACGAGCGGCCGCGCGTAAGCGTGCCGAGGACCTGCT 

2879830 

Qy 631 GAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCAC 690 

II I I I I II II I I I I I II II II I II I I I I I 

Db 2879831 GATCCGAGTCAATCTCGGCGAACGAATG AAACACCGCCCGGGTGACATGAGCGG 

2879884 

Qy 691 GGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCT 750 

II I II M I I I I I I II I I I II I II I I I I I I I I I 

Db 2879885 CGGCCAGCAGCAACGCGTCGCGGTCGCCCGCGCGATCGCGCTGGACCCGCAATTGATCCT 
2879944 

Qy 751 GTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCT7VATCAGATTGTCGTCCTCCT 810 

I I I I II I I I I I I II I I II I I I I I III I II I 

Db. 2879945 TGCCGACGAACCGACCGCGCACCTGGACTTCATCCAGGTGGAGGAGGTGCTGCGGCTGAT 
2880004 

Qy 811 GGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCG 8 64 

I I I I I I I I I I I M I II I I I I I I I I III 
Db 2880005 CCGCTCGCTAGCGCAGGGCGACCGTGTGGTGGTGGTCGCGACCCACGACAGCCG 2880058 



RESULT 10 
US-09-103-840A-1 

; Sequence 1, Application US/09103840A 

; Patent No. 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: FRASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 
; TITLE OF INVENTION: TUBERCULOSIS 
; FILE REFERENCE: 24366-20007.00 
; CURRENT APPLICATION NUMBER: US/09/103, 840A 
; CURRENT FILING DATE: 1998-06-24 
; NUMBER OF SEQ ID NOS : 2 
; SOFTWARE: Patent In Ver. 2,1 
; SEQ ID NO 1 
; LENGTH: 4411529 
TYPE: DNA 

; ORGANISM: Mycobacterium tuberculosis 

OTHER INFORMATION: H37Rv 
US-09-103-840A-1 

Query Match 2.4%; Score 56.8; DB 3; Length 4411529; 

Best Local Similarity 46.0%; Pred. No. 0.0012; 

Matches 273; Conservative 0; Mismatches 312; Indels 9; Gaps 2; 

Qy 271 GGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTA 330 

I I I I I I II I I II II I I I I I I I I III 

Db 2883366 GGTCGTCGAGTATTCCAGCGGCGGGTACGCCGTGCGGCCGATCGACGGGTTAAGCCTCGA 
2883425 



Qy 331 CGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCT 390 

I I I I I I III II II I I II II I I II I II I I II M I I II 
Db 2883426 CGTGGCGCCGGGGTCGCTGGTGATCTTGCTTGGGCCCAGCGGCTGCGGGAAGACGACCCT 

2883485 

Qy 391 GCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGT 450 

I I I I I II I I II I I I I II I I I I I I I 

Db 2883486 CTTGTCCTGCCTCGGCGGCATCCTGCGCCCGAAGTCCGGCTCAATCAAGTTTGACGATGT 
2883545 

Qy 451 GAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAG 510 

I I I I I I I I I II II III II I M 

Db 2883546 CGACATCACGACGCTGGAGGGCGCCGCGCTGGCGAAGTATCGGCGTGACAAGGTAGGGAT 
2883605 

Qy 511 CGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGC 57 0 

III III I I I I I I I III II I I I I I I I I I I 

Db 2883606 CGTCTTCCAGGCGTTCAACCTGGTCTCGAGCCTTACCGCCCTGGAGT^CGTGATGGTCCC 
2883665 

Qy 571 CATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCT 630 

I I I I II MM I I M I I I I Ml 

Db 2883666 GCTGCGCGCGGCCGG CGTGTCACGAGCGGCCGCGCGTAAGCGTGCCGAGGACCTGCT 

2883722 

Qy 631 GAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCAC 690 

M I I I I I I II I I I I I II I I I I M I I I I I I 

Db 2883723 GATCCGAGTCAATCTCGGCGAACGAATG AAACACCGCCCGGGTGACATGAGCGG 

2883776 

Qy 691 GGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCT 750 

II II I I I I I I M I MM I II I I M II I I I I I I 

Db 2883777 CGGCCAGCAGCAACGCGTCGCGGTCGCCCGCGCGATCGCGCTGGACCCGCAATTGATCCT 

2883836 

Qy 751 GTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCT 810 

I I II I I I I I I M II I I II II I I I III I II I 

Db 2883837 TGCCGACGAACCGACCGCGCACCTGGACTTCATCCAGGTGGAGGAGGTGCTGCGGCTGAT 

2883896 

Qy 811 GGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCG 864 

II I I I I I I I I I I I I II I I I I II I I III 
Db 2883897 CCGCTCGCTAGCGCAGGGCGACCGTGTGGTGGTGGTCGCGACCCACGACAGCCG 28 83950 



RESULT 11 

US-09-252-991A-9848 

; Sequence 9848, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al. 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 
; CURRENT FILING DATE: 1999-02-18 



; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 984 8 

; LENGTH: 972 

; TYPE: DNA 

; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-9848 

Query Match 2.4%; Score 56.2; DB 4; Length 972; 

Best Local Similarity 45.6%; Pred. No. 1.3e-05; 

Matches 276; Conservative 0; Mismatches 323; Indels 6; Gaps 2; 

Qy 311 CTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCA 37 0 

I I I I I I I I I I I I I I I I III I I I I I II III II 
Db 88 CTGAACGGCGTATCGTTCGAACTGGAAGCCGGCAAGACCCTCGCCGTGGTCGGCGAGTCC 147 

Qy 371 GGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACC 4 30 

I II I II I I I I II II I I I II I II I I II II I I I I 

Db 148 GGCTGCGGCAAGTCGACCCTGGCGCGCGCCCTGACCCTGATCGAGGAACCCACCTCCGGC 207 

Qy 431 TTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGC 4 90 

I I II I I I I I I I I I I II I I I I I I I 

Db 2 08 TCGCTGAAAATCGCCGGGCAGGAGGTCAAGGGCGCCAGCAAGGACCAGCGCCGGCA G 2 64 

Qy 491 TTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTG 550 

II I I I I I I I I II I I I I I I I I I I I II 

Db 2 65 TTGCGCCGCGACGTGCAGATGGTCTTCCAGAACCCCTACGCCTCGCTCAATCCGCGACAG 324 

Qy 551 CACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTG 610 

I I I I I I I I I I I II I I I I I I III 

Db 325 AAGATCGGCGACCAGTTGGCCGAGCCGCTGCTGATCAACACCGCGCTGTCGCGCGAGGAA 384 

Qy 611 GAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTAC 670 

II III I I I I I I I I I I I I I I I I I I III 

Db 385 CGCCGCGAAAAGGTCCAGCAGATGATGCGCCAGGTCGGCCTGCGGCCGGAGCATTACCAG 4 44 

Qy 671 AGCTTGGGGGGCATTTCCACGGGTG AGCGGCGCCGGGTCTCCATCGCAGCCCAGCTG 727 

III I I I I I I I I I I I I I I I II I I I I I II III 

Db 445 CGCTACCCGCACATGTTCTCCGGCGGCCAGCGCCAGCGCATCGCCCTGGCCCGGGCGATG 504 

Qy 728 CTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACT 7 87 

I I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I II 
Db 505 ATGCTGCAACCCAAGGTGCTGGTGGCGGACGAGCCGACCTCGGCCCTCGACGTGTCGATC 564 

Qy 7 88 GCTT^TCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGTUVTTGTGGTTCTC 847 

I I I I I I I I I I I I I I II I I I I I I I I 

Db 565 CAGGCCCAGGTACTGAACCTTTTCATGGACCTGCAGCAGCAGTTCCGCACCGCCTACGTG 624 

Qy 848 ACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGC 907 

III II II I I I I I I I I I I I I I I II I 

Db 625 TTCATCTCGCACAACCTGGCGGTGGTCCGCCACGTCGCCGACGACGTCCTGGTGATGTAC 684 

'Qy 908 TTCGG 912 

I I I I 



Db 



685 CTCGG 689 



RESULT 12 

US-09-252-991A-9760 

Sequence 9760, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/ 09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 9760 
LENGTH: 1713 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-97 60 

Query Match 2.4%; Score 56.2; DB 4; Length 1713; 

Best Local Similarity 45.6%; Pred. No. 1.8e-05; 

Matches 27 6; Conservative 0; Mismatches 323; Indels 6; Gaps 2; 

Qy 311 CTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCA 37 0 

I I I I I I I I I I I I I I I I III I I I I I II III II 

Db 62 CTGAACGGCGTATCGTTCGAACTGGAAGCCGGC7UVGACCCTCGCCGTGGTCGGCGAGTCC 121 

Qy 371 GGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACC 430 

I II I I I I I I I I I I II I I I I I II I I I I II III I 

Db 122 GGCTGCGGCTVAGTCGACCCTGGCGCGCGCCCTGACCCTGATCGAGGAACCCACCTCCGGC 181 

Qy 431 TTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGC 490 

I I I I I I I I I M M II I I I I II I I 

Db 182 TCGCTGAAAATCGCCGGGCAGGAGGTCAAGGGCGCCAGC7VAGGACCAGCGCCGGCA G 238 

Qy 491 TTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTG 550 

II I I I I I I I I I I I I I I I I I I I I I II 

Db 239 TTGCGCCGCGACGTGCAGATGGTCTTCCAGAACCCCTACGCCTCGCTCAATCCGCGACAG 298 

Qy 551 CACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTG 610 

I I I I I I II I II II I I I I I I IN 

Db 299 AAGATCGGCGACCAGTTGGCCGAGCCGCTGCTGATCAACACCGCGCTGTCGCGCGAGGAA 358 

Qy 611 GAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTAC 670 

II III I I I I I I I I I I I I I I I I I I III 

Db 359 CGCCGCGAAAAGGTCCAGCAGATGATGCGCCAGGTCGGCCTGCGGCCGGAGCATTACCAG 418 

Qy 671 AGCTTGGGGGGCATTTCCACGGGTG AGCGGCGCCGGGTCTCCATCGCAGCCCAGCTG 727 

III I I I I I I I II I I I I I I II II I I I I I III 

Db 419 CGCTACCCGCACATGTTCTCCGGCGGCCAGCGCCAGCGCATCGCCCTGGCCCGGGCGATG 478 



Qy 72 8 CTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACT 7 87 

I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I II 
Db 479 ATGCTGCAACCCAAGGTGCTGGTGGCGGACGAGCCGACCTCGGCCCTCGACGTGTCGATC 53 8 

Qy 788 GCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTC 84 7 

III I I III I II I I I I I I I II I I I I 

Db 539 CAGGCCCAGGTACTGAACCTTTTCATGGACCTGCAGCAGCAGTTCCGCACCGCCTACGTG 598 

Qy 848 ACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGC 907 

III II II I I I I II II I I I I I I I I I 

Db 599 TTCATCTCGCACAACCTGGCGGTGGTCCGCCACGTCGCCGACGACGTCCTGGTGATGTAC 658 

Qy 908 TTCGG 912 

I I II 

Db 659 CTCGG 663 



RESULT 13 

US-09-252-991A-102 08/C 

; Sequence 10208, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID 7\ND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252, 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS: 33142 

; SEQ ID NO 10208 

; LENGTH: 2805 

; TYPE: DNA 

; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-10208 

Query Match 2.4%; Score 56.2; DB 4; Length 2805; 

Best Local Similarity 45.6%; Pred. No. 2.4e-05; 

Matches 276; Conservative 0; Mismatches 323; Indels 6; Gaps 2; 

Qy 311 CTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCA 370 

I I I I I I I I I I I I II I I III I II I I II III II 
Db 1629 CTGAACGGCGTATCGTTCGAACTGGAAGCCGGCAAGACCCTCGCCGTGGTCGGCGAGTCC 1570 

Qy 371 GGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACC 430 

M I I I II I I I I I I I I I I I I I I I I I II II III I 

Db 1569 GGCTGCGGCAAGTCGACCCTGGCGCGCGCCCTGACCCTGATCGAGGAACCCACCTCCGGC 1510 

Qy 431 TTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGC 4 90 

I I I I I I I I I I I I I I I I I I I I I I I 
Db 1509 TCGCTGAAAATCGCCGGGCAGGAGGTCAAGGGCGCCAGCAAGGACCAGCGCCGGCA G 1453 



Qy 491 TTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTG 550 

II I II I II I I II I I I I I I I I I I I II 

Db 1452 TTGCGCCGCGACGTGCAGATGGTCTTCCAGTUVCCCCTACGCCTCGCTCAATCCGCGACAG 1393 

Qy 551 CACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTG 610 

I I I I I I I I I I I II I I I I I I III 

Db 1392 AAGATCGGCGACCAGTTGGCCGAGCCGCTGCTGATCAACACCGCGCTGTCGCGCGAGGAA 1333 

Qy 611 GAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTAC 670 

II III I I I II I I I I I I II I II I I III 

Db 1332 CGCCGCGAAT^GGTCCAGCAGATGATGCGCCAGGTCGGCCTGCGGCCGGAGCATTACCAG 1273 

Qy 671 AGCTTGGGGGGCATTTCCACGGGTG AGCGGCGCCGGGTCTCCATCGCAGCCCAGCTG 727 

III I I I I I I I M I I I M I II I I I I I I I III 

Db 1272 CGCTACCCGCACATGTTCTCCGGCGGCCAGCGCCAGCGCATCGCCCTGGCCCGGGCGATG 1213 

Qy 728 CTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACT 7 87 

I I I I II I I I I I I I I I I I M M I I I I I I I I I I I I II 
Db 1212 ATGCTGCAACCCAAGGTGCTGGTGGCGGACGAGCCGACCTCGGCCCTCGACGTGTCGATC 1153 

Qy 7 88 GCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTC 847 

I I I I I III I I I M I I I I I I I I I I I 

Db 1152 CAGGCCCAGGTACTGAACCTTTTCATGGACCTGCAGCAGCAGTTCCGCACCGCCTACGTG 1093 

Qy 8 48 ACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCATCCTGAGC 907 

III II II I I I I I I I I Mill I I I I 

Db 1092 TTCATCTCGCACAACCTGGCGGTGGTCCGCCACGTCGCCGACGACGTCCTGGTGATGTAC 1033 

Qy 908 TTCGG 912 

I I I I 

Db 1032 CTCGG 1028 



RESULT 14 

US-09-252-991A-13436/C 

; Sequence 13436, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID 7\ND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/ 09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 13436 

; LENGTH: 1509 

TYPE: DNA 
; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-13436 



Query Match 



2.4%; Score 55.6; DB 4; Length 1509; 



Best Local Similarity 46.0%; Pred. No. 2.4e-05; 

Matches 227; Conservative 0; Mismatches 264; Indels 



3; Gaps 



1; 



Qy 339 GCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACG 398 

M I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I 

Db 1502 GCGGGCAGACGCTGGGCATCGTTGGCGAAAGCGGCTCGGGCAAGTCGACCCTTGGCCTGG 1443 

Qy 399 CCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCC 458 

III I I I I II I II I I I I I II III I II 

Db 1442 CAATCCTGCGGCTGCTGGAAAGCCAGGGCGGCATCCGCTTCGAAGGCACCCGGCTGGACG 1383 

Qy 459 GGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCC 518 

I Mill I I I I III I I I I I II 

Db 1382 GTCTCGCGCAACATGACGTGCGCCCGCTGCGCCGCGAGATGCAGGTGGTGTTCCAGGACC 1323 

Qy 519 TGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCC 578 

I I II II I I I I III I I I I I I I ' I I I I I I 

Db 1322 CATATGGCAGCCTCAGCCCACGCATGTGTGTCGGCGAGATCGTCGGCGT^GGCCTGCGCA 1263 

Qy 579 GCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGA 638 

I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 12 62 TCCATAGGATCGGCAGCGAGGCCGAACAGGAGCAGGCGATCATCGACGCGCTGGTGGAGG 1203 

Qy 639 GCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGC 698 

I II I I I I II I I I I I I I II I II I I I 

Db 12 02 TC GGGCTCGATCCGCAGACCCGCTACCGTTACCCCCACGAATTCTCCGGCGGCCAGC 1146 

Qy 699 GGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATG 758 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 1145 GCCAGCGCATCGCCATCGCCCGGGCACTGGTGCTGAAACCGGCACTGATCCTGCTCGACG 1086 

Qy 759 AGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAAC 818 

I I I I I I I I I I I I I I I I I I I I II I I I II I I I I 

Db 1085 AACCGACCTCGGCGCTCGACCGCACCGTGCAGCGCCAGGTCGTGGAATTGCTGCGGCAAC 1026 

Qy 819 TGGCTCGCAGGAAC 832 

II I I I I I I 

Db 1025 TGCAGGGCAAGTAC 1012 



RESULT 15 

US-09-252-991A-15851 

; Sequence 15851, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

APPLICANT: Marc J. Rubenfield et al . 
; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 



; SEQ ID NO 15851 

LENGTH: 840 

TYPE: DNA 
; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-15851 

Query Match 2.3%; Score 54.2; DB 4; Length 840; 

Best Local Similarity 46.4%; Pred. No. 4.3e-05; 

Matches 253; Conservative 0; Mismatches 283; Indels 9; Gaps 2; 

Qy 318 ATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCG 377 

I I I I I I II II II I I I I I I I I I I I I I I II 

Db 107 ATCTCTCGCTGGCCATCCCCGAGGGTTCGTTCAGTGTGATCGTCGGGCCCAACGCCTGCG 166 

Qy 378 GGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGG 437 

III I I I I I II I M II I I I I I I II I I I I I I I I III 

Db 167 GCAAGTCGACCCTGCTGGCGGCATTGTCGCGCCTGTTGGCGCCGGCCGAGGGCCGGGTGG 226 

Qy 438 GGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCT 497 

III II II I I I I I III I I II I 

Db 227 TGCTGGACGGCAGGGATATCCACAGCCTGCCGGGACGGGAAGTGGCGCGGCGTCTCGpCC 286 

Qy 498 ACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTACA 557 

I I I I I I I I M III I I I I II I I I I I I I III I 

Db 2 87 TGCTGCCGCAGAGCGCGCTGGCGCCGGATGGCATCACGGTGGCCGAGCTGGTGGCGCGC- 345 

Qy 558 CCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAG7VAGAAGGTGGAGGCCG 617 

I I I I I I I II II I I I II I I I I I I I I I I 

Db 34 6 GGGCGCTATCCGCACCAGTCGTTCCTGCGCCAGTG-GTCGCCGGCGGATGAGC 397 

Qy 618 TCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGG 677 

I I I I I I I III III I I I I II 

Db 398 GCGCGGTAGCGGCGGCGTTACGCGCCACGCGGGTCGACGGCCTGGCCGAGCGACCGCTCG 457 

Qy 678 GGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATC 737 

I I I I I II I I I I I I I I I I I Mill I I I I I I II I 

Db 458 ATGCGCTCTCCGGCGGCCAGCGGCAACGCGTGTGGATCGCCATGGTGCTGGCGCAGGAAA 517 

Qy 738 CTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGA 7 97 

I II I I I I I I I I I I I I I II I I I I I I I I I II I I 

Db 518 CCCCGTTGCTGCTGCTCGACGAGCCGACCACCTACCTGGATATCGTCCACCAGATCGAAT 577 

Qy 7 98 TTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACC 857 

I Ml I I I I I I I I I I I I I I I III III I Mil 
Db 57 8 TGCTCGAACTGCTCGCCGAGCTGAATCGCCAGGGGCGCACCATCGTCGCCGTGCTGCACG 637 

Qy 858 AGCCC 862 

I I I 

Db 638 ACCTC 642 
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.2 


2027 


15 


US-10-405-806-1 


Sequence 1, Appli 


44 


98 


4 


.2 


2053 


15 


US-10-405-806-12 


Sequence 12, Appl 


45 


98 


4 


.2 


2247 


9 


US-09-866-866A-26 


Sequence 26, Appl 



ALIGNMENTS 



RESULT 1 
US-09-837-992-4 

; Sequence 4, Application US/09837992 
; Patent No. US20020081687A1 
; GENERAL INFORMATION: 
; APPLICANT: Tian, Hui 



APPLICANT: Schultz, Joshua 
APPLICANT: Shan, Bei 
APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
TITLE OF INVENTION: and Methods of Use 
FILE REFERENCE: 018781-006020US 
CURRENT APPLICATION NUMBER: US/ 09/ 837 , 992 
CURRENT FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: US 60/198,465 
PRIOR FILING DATE: 2000-04-18 
PRIOR APPLICATION NUMBER: US 60/204,234 
PRIOR FILING DATE: 2000-05-15 
NUMBER OF SEQ ID NOS : 45 
SOFTWARE: Patentin Ver. 2.1 
SEQ ID NO 4 
LENGTH: 234 0 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human sitosterolemia gene (SSG) 
NAME/ KEY: CDS 
LOCATION: (107) . . (2062) 

OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: protein 
US-09-837-992-4 

Query Match 100.0%; Score 2340; DB 9; Length 2340; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2340; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

Qy 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 

Db 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

Qy 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

Qy 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

Qy 241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

Qy 301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 



Qy 

Db 



361 
361 



AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 
I I I M I I I I I I I I I I I I I I I M Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AGGAAGCTCAGGCTCCGGG7WVACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 



Qy 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 4 80 

I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I 
Db 421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 4 80 

Qy 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

M I I M I I I M I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 54 0 

Qy 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

Qy 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

Qy 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

Qy 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 78 0 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M 
Db 721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 

Qy 781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 84 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

Qy 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

Qy 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I M I 
Db 901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 

Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I 
Db 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 108 0 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I 
Db 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

Qy 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 114 0 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I M I I 
Db 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 114 0 

Qy 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I 
Db 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1200 

Qy 1201 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 1201 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 1260 



Qy 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 12 61 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

Qy 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 138 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

Qy 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 144 0 

Qy 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

Qy 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

Qy 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 162 0 

I I I I I I I I I I I I I I I I I I I M I I I I I I'M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 162 0 

Qy 1621 TCTCTTGGCCCCCCACTT7\ATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 168 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

Qy 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 
Db 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

Qy 1741 ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 1800 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1741 ATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTT 18 00 

Qy *1801 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 1860 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
Db 1801 TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 1860 

Qy 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1861 CACTTGTGGCAGCTC7\AATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

Qy 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

Qy 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTC7U\AATAAG 204 0 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
Db 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 2040 

Qy 2041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

Qy 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 




Db 



2101 



ACTGTGCATGACTGCTCTGAACGTCTGA7\ATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 



Qy 



2161 




Db 



2161 



Qy 



2221 




Db 



2221 



Qy 



2281 




Db 



2281 



RESULT 2 

US-09-989-981A-5 

; Sequence 5, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 

; APPLICANT: Shan, Bei 

; APPLICANT: Barnes, Robert 

; APPLICANT: Tian, Hui 

; APPLICANT: Tularik Inc. 

; APPLICANT: Board of Regents, The University of Texas System 

; TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 

; FILE REFERENCE: 018781-007320US 

; CURRENT APPLICATION NUMBER: US/09/989, 981A 

; CURRENT FILING DATE: 2002-07-23 

; PRIOR APPLICATION NUMBER: US 60/252,235 

; PRIOR FILING DATE: 2000-11-20 

; PRIOR APPLICATION NUMBER: US 60/253,645 

; PRIOR FILING DATE: 2000-11-28 

; NUMBER OF SEQ ID NOS : 13 

; SOFTWARE: Patent In Ver. 2.1 

; SEQ ID NO 5 

; LENGTH: 234 0 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 
; NAME/ KEY: CDS 
; LOCATION: ( 107 ).. (2062 ) 

OTHER INFORMATION: human ABCG5 (h7VBCG5) 
US-09-989-981A-5 

Query Match 100.0%; Score 2340; DB 10; Length 2340; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2340; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 




Db 



1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 



Qy 



61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I 



61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAT^CAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
241 CAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGAC 300 

301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
301 CAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCT 360 

361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
361 AGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCG 420 

421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
421 CGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGTT 480 

481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
481 CCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCG 540 

541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
541 CGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCA 600 

601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
601 GAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGAT 660 

661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
661 TGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGC 720 

721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
721 CCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTG 780 

781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
781 CATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGT 840 

841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
841 GGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAATTGCCAT 900 

901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTC7VA 960 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
901 CCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTTCAA 960 



Qy 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 961 TGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCTGAC 1020 

Qy 1021 GTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1021 GTCAGTGGATACCCTWVGC/AGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGATGAT 108 0 

Qy 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGT^GAAT 1140 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1081 AGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATATTGAAAGAAT 114 0 

Qy 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCTW^CCAAAGATTCTCCTGGAGTTTT 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

Db 1141 GAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGTTTT 1200 

Qy 1201 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1201 CTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGC 1260 

Qy 12 61 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 1320 

I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1261 AGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCT 132 0 

Qy 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 

Db 1321 GCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCA 1380 

Qy 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I 

Db 1381 GTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCT 1440 

Qy 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1441 GCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCT 1500 

Qy 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1501 GGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGT 1560 

Qy 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 1620 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1561 GTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGC 162 0 

Qy 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1621 TCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCA 168 0 

Qy 1681 TWVTCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I M I I I I 

Db 1681 AAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGG 1740 



Qy 

Db 



1741 
1741 



1800 
1800 



Qy 1801 
Db 1801 

Qy 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M 
Db 1861 CACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGG 1920 

Qy 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 1980 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M 
Db 1921 AATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCT 198 0 

Qy 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAG 204 0 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I M 

Db 1981 GATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGG7\ATAGTTGTTTTCAAAATAAG 2040 

Qy 2041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2 041 GGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGG7\AGTGAAGCTGCCG 2100 

Qy 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2101 ACTGTGCATGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAG 2160 

Qy 2161 GACATCTCAAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2161 GACATCTCAAGTCTTTT7VACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCC 2220 

Qy 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTAC7VACTTGCAGGGACATGTGGT 2280 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2221 TTGAATGCAATGGAAGTGGTTTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGT 2280 

Qy 22 81 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 234 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 22 81 TATTTGGAAATTGTGACTGAGCGGACCCAAGAATGTATVATAATATTCATAAACCTATGGG 2340 



RESULT 3 
US-09-837-992-2 

; Sequence 2, Application US/09837992 
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; APPLICANT: Tian, Hui 
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; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 
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TACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTT 18 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

TACATTCCT^AAAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGT^TTT 18 60 



; SEQ ID NO 2 

LENGTH: 2258 

TYPE: DNA 
; ORGANISM: Mus mus cuius 
; FEATURE: 

OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 
NAME/ KEY: CDS 
; LOCATION: (47).. (2005) 

; OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 

OTHER INFORMATION: protein 
US-09-837-992-2 

Query Match 59.6%; Score 1395.6; DB 9; Length 2258; 

Best Local Similarity 80.7%; Pred. No. 0; 

Matches 1642; Conservative 0; Mismatches 389; Indels 3; Gaps 1; 

Qy 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 12 0 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 GGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCC 60 

Qy 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGT/AACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I III II III III II I I II I I I II II I II II I I I II 
Db 61 CTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCT 120 

Qy 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGCATCCTCCATGCCTC 237 

I II I I I I I II I I I I I I I II I I II I I I II II I I I II I II 
Db 121 GGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTC 180 

Qy 238 CTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTG 297 

I II I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I III I II I I I I I I I I I 
Db 181 CTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTG 240 

Qy 298 GACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCAT 357 

I I M I I I I M II I I I I M II II I M I M I I I I I I I I I II I I I II I I I I I II I 
Db 241 GGACAGGC7U\ATCCTC7VAAGATGTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCAT 300 

Qy 358 CCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGG 417 

I II M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I 
Db 301 CTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCTCCGGGAGGCTGCG 360 

Qy 418 GCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCA 477 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I II 

Db 361 GCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGCTGCGCAGGGACCA 420 

Qy 478 GTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGT 537 

I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I M I I I I M I M I I I I I I I I II 

Db 421 GTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGAGCAGCCTCACTGT 4 80 

Qy 538 GCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTT 597 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 481 GCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGCTCCGCGGACTTCTA 540 

Qy 598 CCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACT 657 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I 
Db 541 CAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCACGTGGCGGACCAAAT 600 



Qy 



658 GATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGC 717 



Db 601 GATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTTTCCATCGC 660 

Qy 718 AGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGA 777 

I I I I I I n II I I I I I II I I I I I M I I II I II M I II I I I I I I II II I I I II 

Db 661 AGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCAACCACAGGACTGGA 720 

Qy 77 8 CTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAAT 837 

I I I I I I I I I I I I I I I I I I I I I I I I III III M I M I I I I I I I I I I I I I I M 

Db 721 CTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCTCGCAGGGACCGAAT 780 

Qy 838 TGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAA7UVTTGC 897 

I I I I II I I I I I I I I I I I I I I I II I I I I I I I I II M I III II I I I I I I I II 

Db 781 TGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACACTTCGACAAAATTGC 840 

Qy 898 CATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATTTCTT 957 

I II I I I I I I I I II I II II I I I I I 11 I I I I I I I I I III II I M II I I I I I 

Db 841 CATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAGATGCTTGGCTTCTT 900 

Qy 958 CAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTATATGGACCT 1017 

I I I I I I I I I I I I I I I I I I II I I I I I II I I I II II I I I I I II II I I I I I I I 

Db 901 CAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGATTTTTACATGGACTT 960 

Qy 1018 GACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAGTCCAGAT 1077 

III I I I I I I I I I I I I I I I I II II I I II I I I I I I I I I I I MM I II I II I II 

Db 961 GACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTACAAGCGAGTACAGAT 102 0 

Qy 1078 GATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGT^TATTGAAAG 1137 

I I I I II II II I I II I I I II I II I III II II I II MM I II I II II 

Db 1021 GCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTGGAGAACATTGAAAG 1080 

Qy 1138 AATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTCCTGGAGT 1197 

I I II M I II M I Mill I II I II I II II II II M II I II I I II II II I 

Db 1081 AGCACGATACCTGAAAACCTTACCCATGGTTCCTTTCAAAACAA7UVGATCCTCCTGGGAT 114 0 

Qy 1198 TTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTGGTGAGAAATAAGCT 1257 

Ml I I II II II I II II I I II II I I II II I II I I I I II II M II I M 

Db 1141 GTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTAATGAGGAATAAGCA 12 00 

Qy 1258 GGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGT 1317 

I I II M II M I I II I I II I I II II M I II II M II I I I II II II I II II I I I 

Db 1201 GGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTCCTCATTTTCTACCT 1260 

Qy 1318 TCTGCGGGTCCGAAGCAATGTGCTAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTA 1377 

I I I II I II I I II I II I II II I I II I I II II I I II II I I I II II I I 

Db 1261 TCTCCGCGTCCAGAACAACACGCTAAAGGGCGCTGTGCAGGACCGCGTGGGGCTGCTCTA 1320 

Qy 137 8 CCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGT 1437 

III II I I II I II II II II I I II I I M I II II II II I I II I II M I I I I I II I 

Db 1321 TCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTGAATCTGTTTCCCAT 1380 

Qy 1438 GCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGAT 1497 

I I II II II II II II I I I M II I I I II I M I I Mill II II I II II II II II I I 

Db 1381 GCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCATAAGTGGCAGATGCT 1440 

Qy 1498 GCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAG 1557 

III I II II I II I I I M II I II II I II II I II I II II I I I I I I II II II I 



Db 



1441 GCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACGGTCATTTTCAGCAG 1500 



Qy 1558 TGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGC 1617 

I I I I I I II I II II I I I I II I I I I II I II I I I I I I I I I I II I I I I I II I I I I I 

Db 1501 TGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTTGGATATTTCTCTGC 1560 

Qy 1618 TGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGT 1677 

II I I I I I I I I I II I II I M I M I I I II I I I I I I I II I I I I I II I I I I I I I II II 

Db 1561 TGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTGCTGCTTGGTATAGT 1620 

Qy 167 8 CCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGT 1737 

I I I I I I II Mill II I I I I I I I I I I I I I I I I I I I I III I III I I I I I I 
Db 1621 CCAAAACCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATCTCTGGGCTGCTTAT 1680 

Qy 1738 TGGATCTGGATTCCTCAG7WVCATACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTA 1797 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I II I 
Db 1681 TGGATCTGGATTTATCAGAAACATACAAGAAATGCCCATTCCTTTAAAAATCCTGGGTTA 174 0 

Qy 1798 TTTTACATTCCAT^AAATATTGCAGTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAA 1857 

I I I I I I I I I I I I I I I I I I II I I I M I I I M II I I I I I I I I I II I I M I I I I I I 
Db 17 41 TTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCAATGAGTTTTACGGCCTGAA 1800 

Qy 1858 TTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCA 1917 

I II II I I I I M I I I I I I II I I I I I I I I M I I I I I I I I I I I I 

Db 18 01 CTTCACTTGTGGTGGATCCAACACCTCTATGCTAAATCACCCGATGTGCGCCATCACCCA 18 60 

Qy 1918 AGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTT 1977 

III I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 

Db 18 61 AGGGGTCCAGTTCATCGAGAAAACCTGCCCAGGTGCTACATCCAGATTCACGGCAAACTT 1920 

Qy 197 8 TCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAAT 2037 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 1921 CCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAGGAATAGTGATTTTTAAAGT 1980 

Qy 2038 AAGGGATCATCTCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTG 2 091 

Mill I II II II I M I I I I I II I I I I II I I III 

Db 1981 CAGGGACTACCTGATTAGCAGATAGTTAAGATGACAGGCAGGAAAGGGTTAATG 2034 
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US-09-989-981A-1 

Sequence 1, Application US/09989981A 
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PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentin Ver. 2.1 
SEQ ID NO 1 

LENGTH: 1959 -^"^ 
TYPE: DNA 

ORGANISM: Mus musculus 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (1) . . (1959) 

OTHER INFORMATION: mouse ABCG5 (mABCG5) 
US-09-989-981A-1 

Query Match 58.4%; Score 1365.4; DB 10; Length 1959; 

Best Local Similarity 81.4%; Pred. No. 0; 

Matches 1595; Conservative 0; Mismatches 361; Indels 3; Gaps 1; 

Qy 107 ATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGC 166 

I I I I I I I I I I I I I I I I III II III III II I I I II M II 
Db 1 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 60 

Qy 167 TCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTGGGC 223 

I I I I I II I I I I I I I M I I I I II I II I I II I I I I I I II I I I 
Db 61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 120 

Qy 224 ATCCTCCATGCCTCCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCT 283 

I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I III 
Db 121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCA 180 

Qy 2 84 TGCCGGCAGCAGTGGACCAGGCAGATCCTC7WVGATGTCTCCTTGTACGTGGAGAGCGGG 343 

I I I I I I II I I I I I I I I I I I I M I I I I I I I I II I I I I M I I I I I I II I I I II 
Db 181 TGCCAGCAGT^GTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGC 240 

Qy 344 CAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATG 403 

I I M I I I I II I I I I II I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 241 CAGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATC 300 

Qy 4 04 TCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCG 4 63 

I I I I I I I I I I II I II II I I M I I I I I I I I II I I I I I I I II I III I I I 

Db 301 TCCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAG 360 

Qy 464 CTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTG 523 

I I I II I I I II II I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 361 CTGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTG 420 

Qy 524 AGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGC 583 

I I I I I I I I I II M I I I I I I I I II III I I I I I III II II I I I I II I I I I I II 
Db 421 AGCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGCGATGCTGGCCCTCTGCCGCAGC 480 

Qy 584 AATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCAT 643 

I I II II I I I I I I II I I I I I I I I I M I I M I I I I I I I I I I I I I I II 
Db 4 81 TCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCCAC 540 

Qy 644 GTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGC 7 03 

I I I I I I I I I I I I I I I I I I I III I M I I I I I M I I I I I II I I I I I I I II 
Db 541 GTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGC 600 



Qy 704 CGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCA 763 

M II I I I I I II I I I II II II II I I I I I II II I II I I I I II I I I I I I I I M 
Db 601 CGAGTTTCCATCGCAGCCC7\ACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGCCA 660 

Qy 764 ACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCT 823 

I I I I II II I I I I I I M I I II I I I I I Mill I I I I I I I III III II I I I I I I 

Db 661 ACCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGGCT 72 0 

Qy 824 CGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTC 8 83 

MINI I I I I I I I I I I I II I I I I I I I I I M I M I II I I I I I M I II II I I 

Db 721 CGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAACAC 78 0 

Qy 884 TTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAA 943 

II I I I I I I I I II I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 781 TTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGGAG 84 0 

Qy 944 ATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGAC 1003 

I I I I I I I I I M I I I I I MM II I I II M II II M II I II I I I II II I I I I I 

Db 841 ATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGAT 900 

Qy 1004 TTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCC 1063 

II M II I I II II I I I II M I II I II II I II II II I I I I II I I I II II I I I 

Db 901 TTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGTAC 960 

Qy 1064 AAGAGAGTCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTG 1123 

I 11 II II I II II I I I II I I II II II II I II I I II I III I I II I II 

Db 961 AAGCGAGTACAGATGCTGGAATGTGCCTTCAAGGAATCTGACATCTATCACAAAATTCTG 1020 

Qy 1124 AAGAATATTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAA 1183 

MM II I I II II I I II II I II II I II I II I I I II I II I I I I II M I III 

Db 1021 GAGAACATTGAAAGAGCACGATACCTGAAAACCTTACCCATGGTTCCTTTCAAAACAAAA 1080 

Qy 1184 GATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAGGAGAGTGACAAGAAACTTG 1243 

II I I II II I I I II I II II I M II II II II I I I I II I I I II II I I II 

Db 1081 GATCCTCCTGGGATGTTCGGCAAGCTTGGTGTCCTGCTGAGGCGAGTAACAAGAAACTTA 114 0 

Qy 1244 GTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAATCTGATCATGGGTTTGTTC 1303 

MM I II II M I II I I II I I II II I II II I II II II I II I I II M II I I III 
Db 1141 ATGAGGAATAAGCAGGCAGTGATTATGCGTCTCGTTCAGAATCTGATCATGGGCCTCTTC 1200 

Qy 1304 CTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTT^GGGTGCTATCCAGGACCGC 1363 

M I II I I I I I I I I I I I I I II I II I I M II II II I II I II I M II II 
Db 12 01 CTCATTTTCTACCTTCTCCGCGTCCAGAAC7U^.CACGCTAAAGGGCGCTGTGCAGGACCGC 1260 

Qy 1364 GTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACACAGGCATGCTGAACGCTGTG 1423 

M M II II M III I II I II I II I I I I I I I I I I I M II I II I II I II II I 
Db 1261 GTGGGGCTGCTCTATCAGCTTGTGGGTGCCACCCCATACACCGGCATGCTCAATGCTGTG 1320 

Qy 1424 AATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAG 1483 

II I I M II I I II Mill MM M I II II II I M I II I II I M II I II I I II II 

Db 1321 AATCTGTTTCCCATGCTGAGAGCCGTCAGCGACCAGGAGAGTCAGGATGGCCTGTATCAT 1380 

Qy 1484 AAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACC 1543 

M I II II I I I M MM II I I I I II II II M I I II I I I M II I M I Mill 

Db 1381 AAGTGGCAGATGCTGCTCGCCTACGTGCTACACGTCCTCCCCTTCAGCGTCATCGCCACG 1440 

Qy 1544 ATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCTTACATCCTGAGGTTGCCCGATTT 1603 



Db 1441 GTCATTTTCAGCAGTGTGTGTTATTGGACTCTGGGCTTGTATCCTGAAGTTGCCAGATTT 1500 

Qy 1604 GGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAATTGGTGAATTTCTAACTCTTGTG 1663 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1501 GGATATTTCTCTGCTGCTCTTTTGGCCCCTCACTTAATTGGAGAATTTCTAACACTTGTG 1560 

Qy 1664 CTACTTGGTATCGTCCAAAATCCAAATATAGTCAACAGTGTAGTGGCTCTGCTGTCCATT 1723 

II I I I I I I I I I I I I I I I I II I I I I I I II I I II I I I I I I I I I II I I II III 

Db 1561 CTGCTTGGTATAGTCCAAAACCCTAATATTGTCAACAGTATAGTGGCTCTGCTCAGCATC 1620 

Qy 1724 GCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACATACAAGAAATGCCCATTCCTTTT 17 83 

I III I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I II I I II I I I I I I I I I I 
Db 1621 TCTGGGCTGCTTATTGGATCTGGATTTATCAGAAACATACAAGAAATGCCCATTCCTTTA 168 0 

Qy 17 84 AAAATCATCAGTTATTTTACATTCCAAAAATATTGCAGTGAGATTCTTGTAGTCAATGAG 1843 

I I I I II I I I I I I I I I I I I II M II I II I I II I I II I I I I I I M I I I I I I I I I 

Db 1681 AAAATCCTGGGTTATTTTACATTCCAAAAATACTGTTGTGAGATTCTCGTGGTCAATGAG 1740 

Qy 1844 TTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATGTTTCTGTGACAACTAATCCAATG 1903 

II I M I I I I I I I II I I I I I II I I I I I I I II I I I I I I I I I I I I 

Db 1741 TTTTACGGCCTGAACTTCACTTGTGGTGGATCCAACACCTCTATGCTAAATCACCCGATG 1800 

Qy 1904 TGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAACCTGCCCAGGTGCAACATCTAGA 1963 

II III I II I I I I I I I II I I I I I I I I I I I I II M I I I I I I I I I I I I I I III 
Db 1801 TGCGCCATCACCCAAGGGGTCCAGTTCATCGAGAA7\ACCTGCCCAGGTGCTACATCCAGA 18 60 

Qy 1964 TTCACAATGAACTTTCTGATTTTGTATTCATTTATTCCAGCTCTTGTCATCCTAGGAATA 2023 

I I I I I II I I I II M II III I I I II I I I I I I M I I I I I I I I I I I I I I I 

Db 18 61 TTCACGGCAAACTTCCTCATCTTATATGGGTTTATCCCAGCTCTGGTCATCCTAGGAATA 192 0 

Qy 2024 GTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGTAG 2062 

II I I I I I I I I I I I M I I I I I I I I I I I I I I 

Db 1921 GTGATTTTTAAAGTCAGGGACTACCTGATTAGCAGATAG 1959 
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Query Match 



50.2%; Score 1174.2; DB 15; Length 2512; 



Best Local Similarity 71.0%; Pred. No. 0; 

Matches 1729; Conservative 0; Mismatches 603; Indels 103; Gaps 9; 

Qy 1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCC7UVCTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 81 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 140 

Qy 61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 141 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 200 

Qy 121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 18 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2 01 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 260 

Qy 181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 24 0 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M I 
Db 261 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 320 

Qy 241 CAGCGTCAGCCACCGC GTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGT 296 

I I I I I I I I I I I I II I I I I I I I I I II 

Db 321 CAGCGTCAGGTAAGGCAGAGCCCTTGCTGCTGCTGCTCCCCCAGGAGTGCGGGGCCCGGC 38 0 

Qy 297 GGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCA 356 

II I II I I I I I II I I I I I I I I I 

Db 381 GCTCACCCCTCTGCTGCCTTTCTTCACTCTTTAAGTGCCAGTCTGGGCACTTCGGGCTCC 44 0 

Qy 357 TCCTAGGAAGCTCAGGCTCC GGGAAAACCACGCTGCTGGACGCCATGTCC 406 

II I III I I I I I I I I I I M I Mil 

Db 441 CTCTTTAGTGGATCGGGTGGAGAGAGGAGAGGGAGAAGGGCTGTGCTGGGAAACATGGAG 500 

Qy 4 07 GGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTG 4 66 

I I III I I I I I I I I I I I I I I I 

Db 501 CGACAGTGAATGGCCCCTCCCCCTGCCCAGGGAAGGGCCTGGGCATAAACAAAGTGGCAG 560 

Qy 4 67 CGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGC 52 6 

I I I I I I II I I I I I I I I I II 

Db 561 CAGTGCCCTGCCAACCCAGTGTCTACGGCCTGCCCTCTGTGGATGGGAATGGGGGTACTG 62 0 

Qy 527 AGCCT CACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCAT 573 

II I I I I I I I I I I I I II I 

Db 621 CGAATGCAAGGAGTCTTGAAACCTGGTGAAAGAATGCAGGGACAGCCACCTCGCAGCCAA 68 0 

Qy 574 CCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAG 633 

II II I I I I I I I I I I I I II I I I M II I 

Db 681 ACGGACAGGACATTCAGAGCAACTCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCT 740 

Qy 634 TCTGAGCCATGTGGCAG ACCGACTGATTGGCAACTACAGCT 674 

I I I I I I I I I I M I I I I I III 

Db 7 41 CAGTCGCTATCTGCCAGGTTCTACAGAGGAGGGCGCAGAGACTG7\7\ACACGTTAGGAGCC 8 00 

Qy 675 TGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCAT CG 716 

II II I I I I I I I I I I I I I I 

Db 8 01 TGTCCGGAGACTACTGGGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGC 860 

Qy 717 CAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGG 776 

I I I I I I I I I I I I I I I I I II II II 



Db 8 61 CCCTTCCAGGGCCCCAAGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTG 920 

Qy 777 ACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAA 836 

I III I I I I I I I I M I I I I I M III 

Db 921 AGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGAT 980 

Qy 8 37 TTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAG CTCTTTGAC 889 

I I I I I I I I I I II I I I I I III 

Db 981 GTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGG 1040 

Qy 8 90 AAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTT 949 

I I I I I I I M I I II I I I I I I I II 

Db 1041 AAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGG 1100 

Qy 950 GATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGACTTCTAT 1009 

I I I I II I I I I I I I I II I II I I I II 

Db 1101 GAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGCAGT TCCAGGACTGCTTCTCC 1157 

Qy 1010 ATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGA 1069 

I M I I I I I I I I I I I I I I I 

Db 1158 TACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCGTGCGCGAGACGCTGCACTAC 1217 

Qy 1070 GTCCAGATGATAGAATCTGCCTACAAGA7\ATCAGCAATTTGTCATAAAACTTTGAAGAAT 1129 

I I M I I II I I I I I I I I I I I I I I 

Db 1218 ACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCTTCCAGAAGAAGGTGGAGGCC 1277 

Qy 1130 AT TGAAAGAATGAAACACCTGAAAACGTTACCAA 1163 

I I I I I I I I I I I I I 

Db 1278 GTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACCGACTGATTGGCAACTACAGCTTG 1337 

Qy 1164 TGGTTCCTTTCATWVCCAAAGATTCTCCTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGA 1223 

I I I I I I I I II I I I I I I I I I I I I I 

Db 1338 GGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGAT 1397 

Qy 1224 GGAGAGTGACAAGAAACTTGGTGAGTWVTAAGCTGGCAGTGATTACGCGTCTCCTTC 1280 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I II I I I I I I I I 
Db 1398 CCTAGAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTC 1457 

Qy 1281 AGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGC 1340 

I I I I I M I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I M M 
Db 1458 AGAATCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGC 1517 

Qy 1341 TAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGT 14 00 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 1518 TAAAGGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGT 1577 

Qy 1401 ACACAGGCATGCTGAACGCTGTG7\ATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGG 1460 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 1578 ACACAGGCATGCTGAACGCTGTGAATCTGTTTCCCGTGCTGCGAGCTGTCAGCGACCAGG 1637 

Qy 1461 AGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCC 1520 

I I I I I I I I I I I I I I I I I I I M I I II I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 1638 AGAGTCAGGACGGCCTCTACCAGAAGTGGCAGATGATGCTGGCCTATGCACTGCACGTCC 1697 

Qy 1521 TCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCT 1580 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1698 TCCCCTTCAGCGTTGTTGCCACCATGATTTTCAGCAGTGTGTGCTACTGGACGCTGGGCT 1757 



Qy 1581 TACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAA 164 0 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I M I I 
Db 1758 TACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGCCCCCCACTTAA 1817 

Qy 1641 TTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAATATAGTCT^CA 1700 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I 
Db 1818 TTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCC7\AAATCCAAATATAGTCAACA 1877 

Qy 1701 GTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACA 1760 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 187 8 GTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATTCCTCAGAAACA 1937 

Qy 1761 TACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCA 1820 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1938 TACAAGAAATGCCCATTCCTTTTAAAATCATCAGTTATTTTACATTCCAAAAATATTGCA 1997 

Qy 1821 GTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATG 188 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I 
Db 1998 GTGAGATTCTTGTAGTCAATGAGTTCTACGGACTGAATTTCACTTGTGGCAGCTCAAATG 2057 

Qy 1881 TTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAG7W\A 194 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 2058 TTTCTGTGACAACT7UVTCCAATGTGTGCCTTCACTCAAGGAATTCAATTCATTGAGAAAA 2117 

Qy 1941 CCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTC 2000 

I I I I I I I I I I I I I I I I I I I I I M I I I I I M M I I I I I I I I I I I I I I I M I I I I I I I I I I I 
Db 2118 CCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGTATTCATTTATTC 2177 

Qy 2001 CAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGT 2060 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2178 CAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATCTCATTAGCAGGT 2237 

Qy 2061 AGTG7WVGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGA 212 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2238 AGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCATGACTGCTCTGA 2297 

Qy 2121 ACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAAC 218 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 2298 ACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTCAAGTCTTTTAAC 2357 

Qy 2181 CATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGCAATGGAAGTGGT 2240 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2358 CATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGCAATGGAAGTGGT 2417 

Qy 2241 TTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGA 2300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2418 TTATAGTCCCTTGCTCTTACAACTTGCAGGGACATGTGGTTATTTGGAAATTGTGACTGA 2477 

Qy 2301 GCGGACCCAAGAATGTAAATAATATTCATAAACCT 2335 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2478 GCGGACCCAAGAATGTAAATAATATTCATAAACCT 2512 
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Qy 1869 GCAGCTCAAATGTTTCTGTGAC7\ACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAAT 1928 

I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 GCAGCTCAAATGTTTCTGTGACAACTAATCCAATGTGTGCCTTCACTCAAGGAATTCAAT 60 

Qy 1929 TCATTGAGAAAACCTGCCCAGGTGCTyVCATCTAGATTCACAATGAACTTTCTGATTTTGT 1988 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 
Db 61 TCATTGAGAAAACCTGCCCAGGTGCAACATCTAGATTCACAATGAACTTTCTGATTTTGT 120 

Qy 1989 ATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATC 2048 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I 

Db 121 ATTCATTTATTCCAGCTCTTGTCATCCTAGGAATAGTTGTTTTCAAAATAAGGGATCATC 180 

Qy 2049 TCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCA 2108 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 TCATTAGCAGGTAGTGAAAGCCATGGCTGGGAAAATGGAAGTGAAGCTGCCGACTGTGCA 240 

Qy 2109 TGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTC 2168 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 TGACTGCTCTGAACGTCTGAAATGAGAGTGCCATGTATTTCTTTCTTGACAGGACATCTC 300 

Qy 2169 AAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGC 2228 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 AAGTCTTTTAACCATTAAGACTCCATTTGTGCCTCTTGGATCCAAGCAGGCCTTGAATGC 360 



Qy 

Db 



2229 
361 



2288 
420 



Qy 2289 7U^TTGTGACTGAGCGGACCC7\AGAATGTAAATAATATTCATAAACCTATGGG 2340 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 AATTGTGACTGAGCGGACCCAAGAATGTAAATAATATTCATAAACCTATGGG 472 



RESULT 7 
US-09-837-992-7 

; Sequence 7, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 018781-006020US 

; CURRENT APPLICATION NUMBER: US/09/837,992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS: 45 

; SOFTWARE: PatentlnVer. 2.1 

; SEQ ID NO 7 

LENGTH: 249 

TYPE: DNA 
; ORGANISM: Homo sapiens 
; FEATURE: 

; OTHER INFORMATION: exon 1 of hSSG 
US-09-837-992-7 

Query Match 10.6%; Score 249; DB 9; Length 249; 

Best Local Similarity 100.0%; Pred. No. 2.3e-64; 

Matches 249; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1 GTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGA 60 

61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
61 GGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTC 120 

121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
121 ATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCT 180 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

181 GGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCTCCTA 240 

241 CAGCGTCAG 249 

I I I I I I I I I 
241 CAGCGTCAG 249 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



RESULT 8 

US-09-837-992-14 

; Sequence 14, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 018781-006020US 

; CURRENT APPLICATION NUMBER: US/09/837,992 

; CURRENT FILING DATE: 2001-04-18 

PRIOR APPLICATION NUMBER: US 60/198,465 
; PRIOR FILING DATE: 2000-04-18 
; PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 45 
; SOFTWARE: Patent In Ver. 2.1 
; SEQ ID NO 14 
; LENGTH: 214 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 

; OTHER INFORMATION: exon 8 of hSSG 
US-09-837-992-14 

Query Match 9.1%; Score 214; DB 9; Length 214; 

Best Local Similarity 100.0%; Pred. No. 8e-54; 

Matches 214; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1011 TGGACCTGACGTCAGTGGATACCCAAAGCAAGG7VACGGGAAATAGAAACCTCCAAGAGAG 1070 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 TGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCTCCAAGAGAG 60 

Qy 1071 TCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTG7\AGAATA 1130 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 61 TCCAGATGATAGAATCTGCCTACAAGAAATCAGCAATTTGTCATAAAACTTTGAAGAATA 120 

Qy 1131 TTGAAAGAATGAAACACCTGAAAACGTTACCAATGGTTCCTTTCAAAACCAAAGATTCTC 1190 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 TTGAAAGAATGAAACACCTGA7WVCGTTACC7\ATGGTTCCTTTCAAAACCAAAGATTCTC 180 

Qy 1191 CTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAG 1224 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 CTGGAGTTTTCTCTAAACTGGGTGTTCTCCTGAG 214 



RESULT 9 

US-09-837-992-15 

; Sequence 15, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 



; APPLICANT: Shan, Bei 
; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosteroleinia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 018781-006020US 

; CURRENT APPLICATION NUMBER: US/09/837,992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 45 
; SOFTWARE: Patentin Ver. 2.1 
; SEQ ID NO 15 

LENGTH: 206 

TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

; OTHER INFORMATION: exon 9 of hSSG 
US-09-837-992-15 

Query Match 8.8%; Score 206; DB 9; Length 206; 

Best Local Similarity 100.0%; Pred. No. 2.1e-51; 

Matches 206; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1225 GAGAGTGACAAGATVACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAA 1284 

I M I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 GAGAGTGACAAGAAACTTGGTGAGAAATAAGCTGGCAGTGATTACGCGTCTCCTTCAGAA 60 

Qy 1285 TCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAA 1344 

M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 61 TCTGATCATGGGTTTGTTCCTCCTTTTCTTCGTTCTGCGGGTCCGAAGCAATGTGCTAAA 120 

Qy 1345 GGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACAC 1404 

M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 GGGTGCTATCCAGGACCGCGTAGGTCTCCTTTACCAGTTTGTGGGCGCCACCCCGTACAC 180 

Qy 14 05 AGGCATGCTGAACGCTGTGAATCTGT 1430 

I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 AGGCATGCTGAACGCTGTGAATCTGT 206 



RESULT 10 
US-09-989-981A-7 

Sequence 7, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 



; PRIOR APPLICATION NUMBER: US 60/252,235 

; PRIOR FILING DATE: 2000-11-20 

; PRIOR APPLICATION NUMBER: US 60/253,645 

; PRIOR FILING DATE: 2000-11-28 

; NUMBER OF SEQ ID NOS : 13 

; SOFTWARE: Patentin Ver. 2.1 

; SEQ ID NO 7 

LENGTH: 2669 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 
; NAME/KEY: CDS 

LOCATION: ( 100 )..( 2121 ) 

OTHER INFORMATION: human ABCG8 (hABCGB) 
US-09-989-981A-7 

Query Match 8.7%; Score 203.6; DB 10; Length 2669; 

Best Local Similarity 54.4%; Pred. No. 6.8e-50; 

Matches 432; Conservative 0; Mismatches 359; Indels 3; Gaps 1; 

Qy 2 85 GCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGC 344 

I I I I I I I I III I II I I I III I I I I I I I I I I 

Db 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCT7\AGCTTCAAAGTGAGAAGTGGGC 394 

Qy 345 AGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGT 404 

II I I II I II I II II I I I I I I I I I I I I I I II I II I I I I I II 

Db 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

Qy 4 05 CCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGC 464 

I I I I III III II I I I I I I I I I I I I 

Db 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

Qy 465 TGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGA 524 

I I I I I I I I II I I I I I I I J III III Mill 
Db 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

Qy 525 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 58 4 

I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I 

Db 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

Qy 585 ATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCC 641 

II II I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I 

Db 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

Qy 642 ATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGC 701 

I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I 

Db 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

Qy 702 GCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGC 761 

I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 

Db 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

Qy 762 CAACCACAGGCCTGGACTGCATGACTGCT7\ATCAGATTGTCGTCCTCCTGGTGGAACTGG 821 

I I I I I I I I I I I I I I I I I I I I I III III I I I I 

Db 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 



Qy 



822 CTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGC 881 



Db 



875 



Qy 882 TCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGG 941 

I I I I I I I II III I II I I I I I I II III I 

Db 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

Qy 942 AAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAT^ACCCTTTTG 1001 

I I I I I II I I I I II I I I I I I I I M I M I I I I I I I II 

Db 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

Qy 1002 ACTTCTATATGGACCTGACGTCAGTGGATACCCAT^GCTUVGGAACGGGAT^TAGAAACCT 1061 

I I I I I II I I I I I I I I I I I I I I I I I II I II I I I I I I I III 

Db 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

Qy 1062 CCAAGAGAGTCCAG 1075 

III I III 
Db 1115 GGGAGAAGGCTCAG 112 8 



RESULT 11 
US-09-989-981A-3 

Sequence 3, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentin Ver. 2.1 
SEQ ID NO 3 
LENGTH: 2019 
TYPE : DNA 

0RG7\NISM: Mus mus cuius 
FEATURE : 
NAME/KEY: CDS 
LOCATION: (1) . . (2019) 

OTHER INFORMATION: mouse ABCG8 (mABCGS) 
US-09-989-981A-3 

Query Match 8.5%; Score 199.2; DB 10; Length 2019; 

Best Local Similarity 54.0%; Pred. No. 1.2e-48; 

Matches 430; Conservative 0; Mismatches 363; Indels 3; Gaps 1; 

Qy 283 TTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGG 342 

I I I I I I II III III MM III II I I I I I I I 



Db 234 TAGCAGCCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGG 293 

Qy 343 GCAGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGG7W\ACCACGCTGCTGGACGCCAT 4 02 

I I I I I II M I I I I II I I I I M II I I II I I I I II I II I I I I M II 

Db 294 ACAGATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGAT 353 

Qy 4 03 GTCCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGC 462 

INN II II Mill I I I II I I 

Db 354 CACAGGCAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACC 413 

Qy 4 63 GCTGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCT 522 

I I I I I I I I I I I I I I I I I II II III I I I I I 
Db 414 CAGTACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCT 473 

Qy 523 GAGCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGG 582 

I II I I I I I I I I I I I I I I I I I III II I I II I I I I I 
Db 474 GCCCAACCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGAC 533 

Qy 583 CAATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAG 639 

I II II I III I I I I I I I I II I I II I I II M I I II I 

Db 534 CTTCTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCG 593 

Qy 640 CCATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCG 699 

II II II I I I I II I I III I I I I I M I I I I I I 

Db 594 GCAGTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCG 653 

Qy 7 00 GCGCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGA 759 

I I I I I I II I I I I I I I I I I I I I I III I I I I I I I 

Db 654 CCGACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGA 713 

Qy 760 GCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACT 819 

I I I I I I I I I I I I I I I I I I I I I I III III II 

Db 714 ACCCACTTCTGGCCTCGACAGCTTCACAGCCCAC7KATCTGGTGACAACCTTGTCCCGCCT 773 

Qy 82 0 GGCTCGCAGGAACCGTVATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCA 87 9 

III I I I I I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 774 GGCCAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAG 833 

Qy 88 0 GCTCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGC 939 

I I I I M I I I I I III I II I MM M III 

Db 834 GCTATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCA 893 

Qy 94 0 GGAAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTT 999 

I I M I I I I I I I I II I II II I II II II II II M I II I 

Db 8 94 GCAAATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGC 953 

Qy 1000 TGACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAAC 1059 

II I I II M I II I I II I I I II II II II II M II II 

Db 954 GGACTTCTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCAC 1013 

Qy 1060 CTCCAAGAGAGTCCAG 1075 

I III I III 

Db 1014 CGTGGAGAAGGCACAG 1029 



RESULT 12 
US-09-837-992-17 



; Sequence 17, Application US/09837992 

; Patent No. US20020081687A1 

; GENER7VL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 01878 1-00602 OUS 

; CURRENT APPLICATION NUMBER: US/09/837, 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS : 45 

; SOFTWARE: Patent In Ver. 2.1 

; SEQ ID NO 17 

LENGTH: 186 

TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: exon 11 of hSSG 
US-09-837-992-17 

Query Match 7.9%; Score 186; DB 9; Length 186; 

Best Local Similarity 100.0%; Pred. No. 2.1e-45; 

Matches 186; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1570 GACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGC 1629 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 GACGCTGGGCTTACATCCTGAGGTTGCCCGATTTGGATATTTTTCTGCTGCTCTCTTGGC 60 

Qy 1630 CCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAA 1689 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 61 CCCCCACTTAATTGGTGAATTTCTAACTCTTGTGCTACTTGGTATCGTCCAAAATCCAAA 120 

Qy 1690 TATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATT 1749 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 TATAGTCAACAGTGTAGTGGCTCTGCTGTCCATTGCGGGGGTGCTTGTTGGATCTGGATT 180 

Qy 1750 CCTCAG 1755 

I I I I I I 

Db 181 CCTCAG 186 



RESULT 13 

US-10-425-114-32175 

Sequence 32175, Application US/10425114 
Publication No. US2004003488 8A1 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Liu, Jingdong 
Zhou, Yihua 
Kovalic, David K. 
Screen, Steven E 
Tabaska, Jack E 



; APPLICANT: Cao, Yongwei 

; TITLE OF INVENTION: Nucleic Acid Molecules and Other Molecules Associated 
With 

TITLE OF INVENTION: Plants and Uses Thereof for Plant Improvement 
FILE REFERENCE: 38-21 (53313) B 
CURRENT APPLICATION NUMBER: US/10/425,114 
CURRENT FILING DATE: 2003-04-28 
NUMBER OF SEQ ID NOS : 7312 8 
SEQ ID NO 32175 
LENGTH: 258 5 
TYPE : DNA 
0RG7\NISM: Zea mays 
FEATURE : 

OTHER INFORMATION: Clone ID: UC-ZMFLB73274A02_FLI 
US-10-425-114-32175 

Query Match 6.7%; Score 156.8; DB 12; Length 2585; 

Best Local Similarity 51.1%; Pred. No. 9.6e-36; 

Matches 424; Conservative 0; Mismatches 397; Indels 9; Gaps 2; 

Qy 177 CCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTGGGCATCCTCCATGCCT 236 

I I I I I I I I I I I I I I I I I I I I I III II 

Db 441 CCCTGTGGCGGGACAGCAAGGCGCTCCCGCCGGGGGCCGGCCCCGCCGCGCTCATCGGCG 500 

Qy 237 CCTACAGCGTCAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGT 2 96 

I M I M I I I I I I II I I I I II II 

Db 501 ACGTGTCCGCCAGGCTCACGTGGAAGGACCTCTGCGTCACCGTGGCTCTGGGCCCCGGCA 560 

Qy 2 97 GGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCA 356 

III I III III I I I I I I I I I I I I I I I I I III 

Db 561 AGACGCAGACCGTGCTGGACGAGCTCACCGGGTACGCGGAGCCCGGGTCGCTGACCGCGC 620 

Qy 357 TCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGTCCGGGAGGCTGG 416 

I I I I I M I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 621 TCATGGGGCCCTCGGGGTCCGGCAAGTCCACCCTGCTCGACGCCCTCGCCGGCCGCCTCG 680 

Qy 417 GGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGCTGCGCCGGGAGC 476 

I I III I I I I II I I I I I I I I I . I III 
Db 681 CCGCCAACGCCTTCCTCTCCGGCAACGTGCTCCTCAACGGCCGCTVAG GCCAAGC 734 

Qy 477 AGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGAGCAGCCTCACCG 536 

II II I I I I I I I II I I I I I I I I I I I I II I I I I I 

Db 735 TCTCCTTCGGCGCCGCGGCGTACGTGACGCAGGACGACAACCTGATCGGGACGCTGACGG 794 

Qy 537 TGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCAATCCCGGCTCCT 596 

I I I I I I I I I II I I I I I I I I I I I I I I I II I I II I 

Db 7 95 TGCGCGAGACGATCGGCTACTCGGCGCTGCTGCGGCTGCCGGACAAGATGCCGCGGGAGG 854 

Qy 597 TCCAG AAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCCATGTGGCAGACC 653 

Ml I I I I I I I II I I I I I II I I I I I I I I I I I II 

Db 855 ACAAGCGCGCGCTGGTGGAGGGCACCATCGTCGAGATGGGGCTGCAGGACTGCGCCGACA 914 

Qy 654 GACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGCGCCGGGTCTCCA 713 

I I I I I I I I I I II III I I I I I I I I I I I I I I I I II 

Db 915 CCGTCATCGGCAACTGGCACCTCCGCGGGGTCAGCGGCGGCGAGAAGCGCCGCGTCAGCA 974 



Qy 



714 



TCGCAGCCCAGCTGCTCCAGGATCCT7\AGGTCATGCTGTTTGATGAGCCAACCACAGGCC 773 



Db 975 TCGCGCTCGAGCTACTCATGCGCCCGCGCCTCCTCTTCCTCGACGAGCCCACCAGCGGCC 1034 

Qy 774 TGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGGCTCGCAGGAACC 833 

I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1035 TCGACAGCTCGTCTGCGTTCTTCGTGACGCAGACGCTGCGGGGCCTGGCGAGGGACGGCA 1094 

Qy 834 GAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGCTCTTTGACAAAA 893 

II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 

Db 1095 GGACGGTGATTGCTTCCATCCACCAGCCCAGCAGCGAGGTGTTCGAGCTCTTCGACATGC 1154 

Qy 8 94 TTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGGAAATGCTTGATT 953 

I I I I I I MM I I M M M M M III 

Db 1155 TCTTCCTGCTATCCGGGGGCAAGACCGTCTACTTCGGACAAGCATCGCAAGCATGCGAGT 1214 

Qy 954 TCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTGAC 1003 

II II II II I II M II II II I I II I I I III 

Db 1215 TCTTTGCTCAAGCCGGTTTCCCTTGCCCGGCTCTGCGG7VATCCGTCCGAC 1264 



RESULT 14 
US-09-837-992-12 

; Sequence 12^ Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 018781-006020US 

; CURRENT APPLICATION NUMBER: US/09/ 837 , 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS : 45 

; SOFTWARE: Patentin Ver. 2.1 

; SEQ ID NO 12 

; LENGTH: 140 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE : 

OTHER INFORMATION: exon 6 of hSSG 
US-09-837-992-12 

Query Match 6.0%; Score 140; DB 9; Length 140; 

Best Local Similarity 100.0%; Pred. No. 1.4e-31; 

Matches 14 0; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 741 AGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTG 800 

I II II II II II II I M II I M II M II II II I II I II M M II II I II II M II I II I II 

Db 1 AGGTCATGCTGTTTGATGAGCCAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTG 60 

Qy 801 TCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGC 860 



Db 61 TCGTCCTCCTGGTGGAACTGGCTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGC 120 

Qy 861 CCCGTTCTGAGCTTTTTCAG 880 

I I I I I I I I I I I I I I I I I I M 

Db 121 CCCGTTCTGAGCTTTTTCAG 140 



RESULT 15 

US-10-027-632-152155 

; Sequence 152155, Application US/10027632 

; Publication No. US20030204075A9 

; GENERAL INFORMATION: 

; APPLICANT: Wang, David G. 

; TITLE OF INVENTION: Identification and Mapping of Single Nucleotide 
; TITLE OF INVENTION: Polymorphisms in the Human Genome 
; FILE REFERENCE: 108827.129 

; CURRENT APPLICATION NUMBER: US/10/027,632 

; CURRENT FILING DATE: 2002-04-30 

; PRIOR APPLICATION NUMBER: US 60/218,006 

; PRIOR FILING DATE: 2000-07-12 

; PRIOR APPLICATION NUMBER: US 60/198,676 

; PRIOR FILING DATE: 2000-04-20 

; PRIOR APPLICATION NUMBER: US 60/193,483 

PRIOR FILING DATE: 2000-03-29 
; PRIOR APPLICATION NUMBER: US 60/185,218 
; PRIOR FILING DATE: 2000-02-24 
; PRIOR APPLICATION NUMBER: US 60/167,363 
; PRIOR FILING DATE: 1999-11-23 
; PRIOR APPLICATION NUMBER: US 60/156,358 
; PRIOR FILING DATE: 1999-09-28 
; PRIOR APPLICATION NUMBER: US 60/146,002 
; PRIOR FILING DATE: 1999-08-09 
; NUMBER OF SEQ ID NOS: 325720 

SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 152155 
LENGTH: 759 
TYPE : DNA 
; ORGANISM: Human 
US-10-027-632-152155 

Query Match 6.0%; Score 139.6; DB 15; Length 759; 

Best Local Similarity 99.3%; Pred. No. 6.4e-31; 

Matches 139; Conservative 1; Mismatches 0; Indels 0; Gaps 0; 

Qy 1431 TTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGC 1490 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 42 TTCCCGTGCTGCGAGCTGTCAGCGACCAGGAGAGTCAGGACGGCCTCTACCAGAAGTGGC 101 

Qy 1491 AGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTGCCACCATGATTT 1550 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I 
Db 102 AGATGATGCTGGCCTATGCACTGCACGTCCTCCCCTTCAGCGTTGTTRCCACCATGATTT 161 

Qy 1551 TCAGCAGTGTGTGCTACTGG 1570 

I I I I I I I I I I I I I I I I I I I I 
Db 162 TCAGCAGTGTGTGCTACTGG 181 

Search completed: February 27, 2004, 07:11:39 
Job time : 541.721 sees 



